Exact Community Recovery under Side Information: Optimality of Spectral Algorithms
Julia Gaudio, Nirmit Joshi
TL;DR
This work addresses exact community recovery in two-community models under node-attributed side information. It introduces a simple yet optimal spectral algorithm that incorporates side information by a log-likelihood shift and leverages the leading eigenvectors of the observation matrix to mimic genie-aided estimators, enabling exact recovery down to the information-theoretic limit when $I^*>1$. The analysis builds on entrywise eigenvector techniques to show that a carefully weighted combination of top eigenvectors, plus side-information terms, approximates the genie scores with $\ell_\infty$-accuracy $o(\log n)$, thereby achieving the IT threshold in ROS and SBM across Gaussian and Bernoulli observation regimes. The results unify several exact-recovery settings (SBM, Submatrix Localization, $\mathbb{Z}_2$-Synchronization) under a single spectral framework and provide nearly linear-time algorithms with strong performance guarantees. This has practical implications for scalable community detection in networks with attributes, offering provable optimality and efficiency without multi-matrix or multi-stage refinements.
Abstract
We study the problem of exact community recovery in general, two-community block models, in the presence of node-attributed $side$ $information$. We allow for a very general side information channel for node attributes, and for pairwise (edge) observations, consider both Bernoulli and Gaussian matrix models, capturing the Stochastic Block Model, Submatrix Localization, and $\mathbb{Z}_2$-Synchronization as special cases. A recent work of Dreveton et al. 2024 characterized the information-theoretic limit of a very general exact recovery problem with side information. In this paper, we show algorithmic achievability in the above important cases by designing a simple but optimal spectral algorithm that incorporates side information (when present) along with the eigenvectors of the pairwise observation matrix. Using the powerful tool of entrywise eigenvector analysis of Abbe et al. 2020, we show that our spectral algorithm can mimic the so called $genie$-$aided$ $estimators$, where the $i^{\mathrm{th}}$ genie-aided estimator optimally computes the estimate of the $i^{\mathrm{th}}$ label, when all remaining labels are revealed by a genie. This perspective provides a unified understanding of the optimality of spectral algorithms for various exact recovery problems in a recent line of work.
