Memory Efficient GPU-based Label Propagation Algorithm (LPA) for Community Detection on Large Graphs
Subhajit Sahu
TL;DR
The paper addresses the high memory demands of GPU-based LPA for large graphs by introducing memory-efficient variants that replace per-thread hashtables with weighted MG and BM sketches. The proposed νMG8-LPA and νBM-LPA reduce working-set size to achieve $O(|V|)$ space while maintaining competitive time complexity $O(K|E|)$ and acceptable modularity losses. Across large SuiteSparse graphs, these methods dramatically cut memory usage (up to $98\times$) and deliver substantial speedups relative to prior GPU/CPU LPA implementations, with νMG8-LPA showing strong performance on web and social graphs. This work enables scalable, memory-conscious community detection on shared-memory GPUs, with practical implications for handling graphs with billions of edges.
Abstract
Community detection involves grouping nodes in a graph with dense connections within groups, than between them. We previously proposed efficient multicore (GVE-LPA) and GPU-based ($ν$-LPA) implementations of Label Propagation Algorithm (LPA) for community detection. However, these methods incur high memory overhead due to their per-thread/per-vertex hashtables. This makes it challenging to process large graphs on shared memory systems. In this report, we introduce memory-efficient GPU-based LPA implementations, using weighted Boyer-Moore (BM) and Misra-Gries (MG) sketches. Our new implementation, $ν$MG8-LPA, using an 8-slot MG sketch, reduces memory usage by 98x and 44x compared to GVE-LPA and $ν$-LPA, respectively. It is also 2.4x faster than GVE-LPA and only 1.1x slower than $ν$-LPA, with minimal quality loss (4.7%/2.9% drop compared to GVE-LPA/$ν$-LPA).
