Recovering Small Communities in the Planted Partition Model
Martijn Gösgens, Maximilien Dreveton
TL;DR
This work addresses the problem of recovering planted partitions in the Planted Partition Model (PPM) with an arbitrarily large number of communities and heterogeneous sizes. It introduces Diamond Percolation, a parameter-free, triangle-based refinement that derives detected communities from the observed graph via the edges with at least two common neighbors, and analyzes recovery using a correlation-based criterion $\rho(C_n,T_n)$. The paper proves exact, almost exact, and weak recovery guarantees under mild assumptions, including power-law partitions, without requiring prior knowledge of the number of communities $k_n$ or the size distribution, thereby extending classic results for balanced SBM-like models to unbalanced and growing partitions. It also provides a detailed treatment of power-law partitions, establishing recovery guarantees across regimes for the number of communities and intra-community density, and discusses practical extensions and future research directions for more complex network models and heterogeneity. Overall, the results offer a scalable, provably effective approach for community detection in realistic networks with heavy-tailed community sizes and unknown community counts, grounded in a simple, triangle-based refinement connected to classic common-neighbor ideas.
Abstract
We analyze community recovery in the planted partition model (PPM) in regimes where the number of communities is arbitrarily large. We examine the three standard recovery regimes: exact recovery, almost exact recovery, and weak recovery. When communities vary in size, traditional accuracy- or alignment-based metrics become unsuitable for assessing the correctness of a predicted partition. To address this, we redefine these recovery regimes using the correlation coefficient, a more versatile metric for comparing partitions. We then demonstrate that $\textit{Diamond Percolation}$, an algorithm based on common-neighbors, successfully recovers communities under mild assumptions on edge probabilities, with minimal restrictions on the number and sizes of communities. As a key application, we consider the case where community sizes follow a power-law distribution, a characteristic frequently found in real-world networks. To the best of our knowledge, we provide the first recovery results for such unbalanced partitions.
