Memory-Efficient Community Detection on Large Graphs Using Weighted Sketches
Subhajit Sahu
TL;DR
This work tackles the memory bottleneck of parallel community detection on large graphs by replacing per-thread collision-free hashtables with weighted Misra-Gries sketches in Louvain, Leiden, and LPA. The MG-based approach dramatically reduces memory usage while incurring only a small drop in modularity and moderate runtime overhead, enhancing parallel scalability on shared-memory systems. Across 13 large real-world graphs, the MG-enabled methods achieve competitive quality with substantial memory savings, confirming the practicality of memory-efficient sketches for large-scale graph clustering. The proposed technique has significant implications for deploying community detection on systems with many cores, offering a pathway to surpass memory-intensive methods as thread counts grow.
Abstract
Community detection in graphs identifies groups of nodes with denser connections within the groups than between them, and while existing studies often focus on optimizing detection performance, memory constraints become critical when processing large graphs on shared-memory systems. We recently proposed efficient implementations of the Louvain, Leiden, and Label Propagation Algorithms (LPA) for community detection. However, these incur significant memory overhead from the use of collision-free per-thread hashtables. To address this, we introduce memory-efficient alternatives using weighted Misra-Gries (MG) sketches, which replace the per-thread hashtables, and reduce memory demands in Louvain, Leiden, and LPA implementations - while incurring only a minor quality drop (up to 1%) and moderate runtime penalties. We believe that these approaches, though slightly slower, are well-suited for parallel processing and could outperform current memory-intensive techniques on systems with many threads.
