Parallel Algorithms for Median Consensus Clustering in Complex Networks
Md Taufique Hussain, Mahantesh Halappanavar, Samrat Chatterjee, Filippo Radicchi, Santo Fortunato, Ariful Azad
TL;DR
This work addresses the challenge of deriving a single, representative clustering of a graph from multiple input partitions by optimizing a median-consensus objective based on Mirkin distance. It introduces a graph-aware greedy algorithm that moves vertices along graph edges to minimize total disagreement, eliminating the need for quadratic memory and enabling parallel execution. A preprocessing step groups homogeneous partitions, after which the consensus is computed per group, with a parallel OpenMP implementation achieving substantial speedups on large-scale graphs. Empirical results on synthetic LFR benchmarks and real networks show improved accuracy over baselines and strong scalability, including multi-core speedups up to 64 cores and effective handling of graphs with hundreds of thousands of nodes.
Abstract
We develop an algorithm that finds the consensus of many different clustering solutions of a graph. We formulate the problem as a median set partitioning problem and propose a greedy optimization technique. Unlike other approaches that find median set partitions, our algorithm takes graph structure into account and finds a comparable quality solution much faster than the other approaches. For graphs with known communities, our consensus partition captures the actual community structure more accurately than alternative approaches. To make it applicable to large graphs, we remove sequential dependencies from our algorithm and design a parallel algorithm. Our parallel algorithm achieves 35x speedup when utilizing 64 processing cores for large real-world graphs from single-cell experiments.
