Table of Contents
Fetching ...

Accelerating Biclique Counting on GPU

Linshan Qiu, Zhonggen Li, Xiangyu Ke, Lu Chen, Yunjun Gao

TL;DR

A novel data structure is introduced that hashes adjacency lists into truncated bitmaps to enable efficient set intersection on GPUs via bit-wise AND operations and a composite load balancing strategy, integrating pre-runtime and runtime workload allocation, ensures equitable distribution among threads.

Abstract

Counting (p,q)-bicliques in bipartite graphs poses a foundational challenge with broad applications, from densest subgraph discovery in algorithmic research to personalized content recommendation in practical scenarios. Despite its significance, current leading (p,q)-biclique counting algorithms fall short, particularly when faced with larger graph sizes and clique scales. Fortunately, the problem's inherent structure, allowing for the independent counting of each biclique starting from every vertex, combined with a substantial set intersections, makes it highly amenable to parallelization. Recent successes in GPU-accelerated algorithms across various domains motivate our exploration into harnessing the parallelism power of GPUs to efficiently address the (p,q)-biclique counting challenge. We introduce GBC (GPU-based Biclique Counting), a novel approach designed to enable efficient and scalable (p,q)-biclique counting on GPUs. To address major bottleneck arising from redundant comparisons in set intersections (occupying an average of 90% of the runtime), we introduce a novel data structure that hashes adjacency lists into truncated bitmaps to enable efficient set intersection on GPUs via bit-wise AND operations. Our innovative hybrid DFS-BFS exploration strategy further enhances thread utilization and effectively manages memory constraints. A composite load balancing strategy, integrating pre-runtime and runtime workload allocation, ensures equitable distribution among threads. Additionally, we employ vertex reordering and graph partitioning strategies for improved compactness and scalability. Experimental evaluations on eight real-life and two synthetic datasets demonstrate that GBC outperforms state-of-the-art algorithms by a substantial margin. In particular, GBC achieves an average speedup of 497.8x, with the largest instance achieving a remarkable 1217.7x speedup when p = q = 8.

Accelerating Biclique Counting on GPU

TL;DR

A novel data structure is introduced that hashes adjacency lists into truncated bitmaps to enable efficient set intersection on GPUs via bit-wise AND operations and a composite load balancing strategy, integrating pre-runtime and runtime workload allocation, ensures equitable distribution among threads.

Abstract

Counting (p,q)-bicliques in bipartite graphs poses a foundational challenge with broad applications, from densest subgraph discovery in algorithmic research to personalized content recommendation in practical scenarios. Despite its significance, current leading (p,q)-biclique counting algorithms fall short, particularly when faced with larger graph sizes and clique scales. Fortunately, the problem's inherent structure, allowing for the independent counting of each biclique starting from every vertex, combined with a substantial set intersections, makes it highly amenable to parallelization. Recent successes in GPU-accelerated algorithms across various domains motivate our exploration into harnessing the parallelism power of GPUs to efficiently address the (p,q)-biclique counting challenge. We introduce GBC (GPU-based Biclique Counting), a novel approach designed to enable efficient and scalable (p,q)-biclique counting on GPUs. To address major bottleneck arising from redundant comparisons in set intersections (occupying an average of 90% of the runtime), we introduce a novel data structure that hashes adjacency lists into truncated bitmaps to enable efficient set intersection on GPUs via bit-wise AND operations. Our innovative hybrid DFS-BFS exploration strategy further enhances thread utilization and effectively manages memory constraints. A composite load balancing strategy, integrating pre-runtime and runtime workload allocation, ensures equitable distribution among threads. Additionally, we employ vertex reordering and graph partitioning strategies for improved compactness and scalability. Experimental evaluations on eight real-life and two synthetic datasets demonstrate that GBC outperforms state-of-the-art algorithms by a substantial margin. In particular, GBC achieves an average speedup of 497.8x, with the largest instance achieving a remarkable 1217.7x speedup when p = q = 8.
Paper Structure (25 sections, 1 theorem, 11 figures, 5 tables, 3 algorithms)

This paper contains 25 sections, 1 theorem, 11 figures, 5 tables, 3 algorithms.

Key Result

Lemma 1

Obtaining the optimal graph partitioning result is an NP-hard problem.

Figures (11)

  • Figure 1: An example of bipartite graph and time breakdown of BCL ($\clubsuit$ and $\spadesuit$ denotes searching for shared 1-hop and 2-hop neighbors, respectively).
  • Figure 2: A walk-through example of basic model (Basic).
  • Figure 3: Hybrid DFS-BFS exploration in GBC (We use vertices newly added on the selected layer to denote nodes in the search tree for brevity).
  • Figure 4: Intersection with different data structures.
  • Figure 5: A running example of Border
  • ...and 6 more figures

Theorems & Definitions (12)

  • Example 1
  • Definition 1: (p,q)-Biclique yang2021p
  • Example 2
  • Example 3
  • Definition 2: Vertex Priority
  • Example 4
  • Example 5
  • Example 6
  • Example 7
  • Example 8
  • ...and 2 more