CPU vs. GPU for Community Detection: Performance Insights from GVE-Louvain and $ν$-Louvain
Subhajit Sahu
TL;DR
The paper introduces GVE-Louvain, a highly optimized multicore CPU implementation of the Louvain method for community detection, and ν-Louvain, a GPU-based variant. GVE-Louvain achieves substantial speedups over state-of-the-art CPU and GPU baselines and reaches 560M edges per second on a 3.8B-edge graph, with strong scaling across additional cores. ν-Louvain performs competitively but generally does not surpass GVE-Louvain, largely due to diminished workload and parallelism in the algorithm’s later passes, highlighting the advantage of CPU flexibility for irregular workloads. Overall, the findings suggest CPUs offer superior practicality and energy efficiency for large-scale community detection tasks, though GPU approaches with careful design can still be effective for the early, highly parallel phases.
Abstract
Community detection involves identifying natural divisions in networks, a crucial task for many large-scale applications. This report presents GVE-Louvain, one of the most efficient multicore implementations of the Louvain algorithm, a high-quality method for community detection. Running on a dual 16-core Intel Xeon Gold 6226R server, GVE-Louvain outperforms Vite, Grappolo, NetworKit Louvain, and cuGraph Louvain (on an NVIDIA A100 GPU) by factors of 50x, 22x, 20x, and 5.8x, respectively, achieving a processing rate of 560M edges per second on a 3.8B-edge graph. Additionally, it scales efficiently, improving performance by 1.6x for every thread doubling. The paper also presents $ν$-Louvain, a GPU-based implementation. When evaluated on an NVIDIA A100 GPU, $ν$-Louvain performs only on par with GVE-Louvain, largely due to reduced workload and parallelism in later algorithmic passes. These results suggest that CPUs, with their flexibility in handling irregular workloads, may be better suited for community detection tasks.
