Exact Community Recovery in the Geometric SBM
Julia Gaudio, Xiaochun Niu, Ermin Wei
TL;DR
This work identifies the information-theoretic threshold for exact community recovery in the two-community Geometric SBM, where vertex positions are drawn from a Poisson process and edges occur only for mutually visible pairs with label-dependent probabilities. It introduces a two-phase algorithm that is linear in the number of edges: Phase I achieves almost-exact recovery by propagating labels across densely occupied spatial blocks via a connectivity-driven visibility graph, and Phase II refines these labels using a robust Poisson-testing procedure. The analysis hinges on a multitype Poisson framework and a Chernoff–Hellinger divergence criterion that precisely delineates when exact recovery is achievable. The resulting method achieves exact recovery efficiently above the threshold, and the impossibility results delineate the exact boundary, thereby matching information-theoretic limits. This work advances understanding of local-to-global amplification in spatially embedded network models and opens avenues for multi-community extensions and related spatial inference tasks.
Abstract
We study the problem of exact community recovery in the Geometric Stochastic Block Model (GSBM), where each vertex has an unknown community label as well as a known position, generated according to a Poisson point process in $\mathbb{R}^d$. Edges are formed independently conditioned on the community labels and positions, where vertices may only be connected by an edge if they are within a prescribed distance of each other. The GSBM thus favors the formation of dense local subgraphs, which commonly occur in real-world networks, a property that makes the GSBM qualitatively very different from the standard Stochastic Block Model (SBM). We propose a linear-time algorithm for exact community recovery, which succeeds down to the information-theoretic threshold, confirming a conjecture of Abbe, Baccelli, and Sankararaman. The algorithm involves two phases. The first phase exploits the density of local subgraphs to propagate estimated community labels among sufficiently occupied subregions, and produces an almost-exact vertex labeling. The second phase then refines the initial labels using a Poisson testing procedure. Thus, the GSBM enjoys local to global amplification just as the SBM, with the advantage of admitting an information-theoretically optimal, linear-time algorithm.
