Parallelizing Maximal Clique Enumeration on GPUs
Mohammad Almasri, Yen-Hsiang Chang, Izzat El Hajj, Rakesh Nagi, Jinjun Xiong, Wen-mei Hwu
TL;DR
This work targets exact maximal clique enumeration (MCE) on graphs by leveraging GPU parallelism with Bron-Kerbosch. It introduces per-block depth-first traversal of independent subtrees, a worker list for dynamic load balancing, partial induced subgraphs, and a compact two-part representation of the X sets to curb memory usage. The approach yields substantial speedups over state-of-the-art CPU implementations (up to 16.7x on modern GPUs) and demonstrates scalable multi-GPU performance, while keeping overheads for balancing and data management low. These contributions enable efficient, exact MCE at scale and provide an open-source implementation for benchmarking and further research.
Abstract
We present a GPU solution for exact maximal clique enumeration (MCE) that performs a search tree traversal following the Bron-Kerbosch algorithm. Prior works on parallelizing MCE on GPUs perform a breadth-first traversal of the tree, which has limited scalability because of the explosion in the number of tree nodes at deep levels. We propose to parallelize MCE on GPUs by performing depth-first traversal of independent subtrees in parallel. Since MCE suffers from high load imbalance and memory capacity requirements, we propose a worker list for dynamic load balancing, as well as partial induced subgraphs and a compact representation of excluded vertex sets to regulate memory consumption. Our evaluation shows that our GPU implementation on a single GPU outperforms the state-of-the-art parallel CPU implementation by a geometric mean of 4.9x (up to 16.7x), and scales efficiently to multiple GPUs. Our code has been open-sourced to enable further research on accelerating MCE.
