Table of Contents
Fetching ...

Accelerating Maximal Biclique Enumeration on GPUs

Chou-Ying Hsieh, Chia-Ming Chang, Po-Hsiu Cheng, Sy-Yen Kuo

TL;DR

Maximal biclique enumeration is a computationally intensive task on large bipartite graphs. The paper introduces cuMBE, a GPU-optimized MBE algorithm that eliminates recursion through a compact array, enabling DFS-style enumeration with reduced memory overhead. It combines coarse-grained thread-block parallelism, three fine-grained optimizations (early-stop, reverse scanning, lookup tables), and a k-level work-stealing scheme to balance workload. Empirical results on real-world datasets show substantial speedups over CPU serial and parallel CPU approaches, demonstrating the practicality and scalability of GPU-based MBE.

Abstract

Maximal Biclique Enumeration (MBE) holds critical importance in graph theory with applications extending across fields such as bioinformatics, social networks, and recommendation systems. However, its computational complexity presents barriers for efficiently scaling to large graphs. To address these challenges, we introduce cuMBE, a GPU-optimized parallel algorithm for MBE. Utilizing a unique data structure, called compact array, cuMBE eradicates the need for recursion, thereby significantly minimizing dynamic memory requirements and computational overhead. The algorithm utilizes a hybrid parallelism approach, in which GPU thread blocks handle coarse-grained tasks associated with part of the search process. Besides, we implement three fine-grained optimizations within each thread block to enhance performance. Further, we integrate a work-stealing mechanism to mitigate workload imbalances among thread blocks. Our experiments reveal that cuMBE achieves an geometric mean speedup of 4.02x and 4.13x compared to the state-of-the-art serial algorithm and parallel CPU-based algorithm on both common and real-world datasets, respectively.

Accelerating Maximal Biclique Enumeration on GPUs

TL;DR

Maximal biclique enumeration is a computationally intensive task on large bipartite graphs. The paper introduces cuMBE, a GPU-optimized MBE algorithm that eliminates recursion through a compact array, enabling DFS-style enumeration with reduced memory overhead. It combines coarse-grained thread-block parallelism, three fine-grained optimizations (early-stop, reverse scanning, lookup tables), and a k-level work-stealing scheme to balance workload. Empirical results on real-world datasets show substantial speedups over CPU serial and parallel CPU approaches, demonstrating the practicality and scalability of GPU-based MBE.

Abstract

Maximal Biclique Enumeration (MBE) holds critical importance in graph theory with applications extending across fields such as bioinformatics, social networks, and recommendation systems. However, its computational complexity presents barriers for efficiently scaling to large graphs. To address these challenges, we introduce cuMBE, a GPU-optimized parallel algorithm for MBE. Utilizing a unique data structure, called compact array, cuMBE eradicates the need for recursion, thereby significantly minimizing dynamic memory requirements and computational overhead. The algorithm utilizes a hybrid parallelism approach, in which GPU thread blocks handle coarse-grained tasks associated with part of the search process. Besides, we implement three fine-grained optimizations within each thread block to enhance performance. Further, we integrate a work-stealing mechanism to mitigate workload imbalances among thread blocks. Our experiments reveal that cuMBE achieves an geometric mean speedup of 4.02x and 4.13x compared to the state-of-the-art serial algorithm and parallel CPU-based algorithm on both common and real-world datasets, respectively.
Paper Structure (20 sections, 1 equation, 6 figures, 1 table, 2 algorithms)

This paper contains 20 sections, 1 equation, 6 figures, 1 table, 2 algorithms.

Figures (6)

  • Figure 1: (a) Example bipartite graph. (b) The relationship of four sets in the state-of-the-art MBEA. The $P$ set stores the candidate vertex to be added to $R$, while the $R$ set induces the $L$; the $Q$ set check the maximality of $L$, while the $R$ set expands itself to maximal by moving vertices from $P$ to $R$.
  • Figure 2: The MBEA with different advanced techniques on the given example bipartite shown in Figure \ref{['fig:bipartite']}. Each circle represents a checking process. The letter inside a circle is the vertex $x$ selected in that process. The maximal biclique found is represented as $B = \{R' \cup L'\}$.
  • Figure 3: The example of moving vertices using compact array and lookup table in the candidate set $P$. The $P'$ is the next-level candidate set. Assuming that the blue thread executes the atomic swap first, it is responsible for updating vertex $4$ and $6$'s position in the lookup table with the index colored in blue.
  • Figure 4: The early-stop technique in the candidate selection phase with the lookup table $LT^{u}_{L}$ for fast determining if the neighbors vertex $u$ is in the $L$ set.
  • Figure 5: The load distribution among thread blocks across different datasets on a NVIDIA RTX 3090 GPU.
  • ...and 1 more figures