Robust recovery for stochastic block models, simplified and generalized

Sidhanth Mohanty; Prasad Raghavendra; David X. Wu

Robust recovery for stochastic block models, simplified and generalized

Sidhanth Mohanty, Prasad Raghavendra, David X. Wu

TL;DR

This work establishes that robust recovery of communities in sparse SBMs under adversarial edge corruptions is possible once the KS threshold is crossed, i.e. $\lambda_2(T)^2 d>1$. It develops a three‑piece strategy: leverage Bethe Hessian outliers to identify community directions, apply a robust sparse PCA‑style subspace recovery to cope with sparse adversarial noise, and round the recovered subspace into a labeling that remains well correlated with the planted partition. The authors prove tight spectral properties, including a controlled number of outlier eigenvalues and a degree truncation argument that preserves the signal, extending robust recovery to arbitrary fixed $k$ and matching the KS threshold as the computational barrier. Overall, the approach yields a principled, robust spectral pipeline that can tolerate $\Omega(n)$ corruptions and deliver constant correlation with the true communities in polynomial time, advancing the understanding of information–computational tradeoffs in SBM inference.

Abstract

We study the problem of $\textit{robust community recovery}$: efficiently recovering communities in sparse stochastic block models in the presence of adversarial corruptions. In the absence of adversarial corruptions, there are efficient algorithms when the $\textit{signal-to-noise ratio}$ exceeds the $\textit{Kesten--Stigum (KS) threshold}$, widely believed to be the computational threshold for this problem. The question we study is: does the computational threshold for robust community recovery also lie at the KS threshold? We answer this question affirmatively, providing an algorithm for robust community recovery for arbitrary stochastic block models on any constant number of communities, generalizing the work of Ding, d'Orsi, Nasser & Steurer on an efficient algorithm above the KS threshold in the case of $2$-community block models. There are three main ingredients to our work: (i) The Bethe Hessian of the graph is defined as $H_G(t) \triangleq (D_G-I)t^2 - A_Gt + I$ where $D_G$ is the diagonal matrix of degrees and $A_G$ is the adjacency matrix. Empirical work suggested that the Bethe Hessian for the stochastic block model has outlier eigenvectors corresponding to the communities right above the Kesten-Stigum threshold. We formally confirm the existence of outlier eigenvalues for the Bethe Hessian, by explicitly constructing outlier eigenvectors from the community vectors. (ii) We develop an algorithm for a variant of robust PCA on sparse matrices. Specifically, an algorithm to partially recover top eigenspaces from adversarially corrupted sparse matrices under mild delocalization constraints. (iii) A rounding algorithm to turn vector assignments of vertices into a community assignment, inspired by the algorithm of Charikar \& Wirth \cite{CW04} for $2$XOR.

Robust recovery for stochastic block models, simplified and generalized

TL;DR

This work establishes that robust recovery of communities in sparse SBMs under adversarial edge corruptions is possible once the KS threshold is crossed, i.e.

. It develops a three‑piece strategy: leverage Bethe Hessian outliers to identify community directions, apply a robust sparse PCA‑style subspace recovery to cope with sparse adversarial noise, and round the recovered subspace into a labeling that remains well correlated with the planted partition. The authors prove tight spectral properties, including a controlled number of outlier eigenvalues and a degree truncation argument that preserves the signal, extending robust recovery to arbitrary fixed

and matching the KS threshold as the computational barrier. Overall, the approach yields a principled, robust spectral pipeline that can tolerate

corruptions and deliver constant correlation with the true communities in polynomial time, advancing the understanding of information–computational tradeoffs in SBM inference.

Abstract

We study the problem of

: efficiently recovering communities in sparse stochastic block models in the presence of adversarial corruptions. In the absence of adversarial corruptions, there are efficient algorithms when the

exceeds the

, widely believed to be the computational threshold for this problem. The question we study is: does the computational threshold for robust community recovery also lie at the KS threshold? We answer this question affirmatively, providing an algorithm for robust community recovery for arbitrary stochastic block models on any constant number of communities, generalizing the work of Ding, d'Orsi, Nasser & Steurer on an efficient algorithm above the KS threshold in the case of

-community block models. There are three main ingredients to our work: (i) The Bethe Hessian of the graph is defined as

where

is the diagonal matrix of degrees and

is the adjacency matrix. Empirical work suggested that the Bethe Hessian for the stochastic block model has outlier eigenvectors corresponding to the communities right above the Kesten-Stigum threshold. We formally confirm the existence of outlier eigenvalues for the Bethe Hessian, by explicitly constructing outlier eigenvectors from the community vectors. (ii) We develop an algorithm for a variant of robust PCA on sparse matrices. Specifically, an algorithm to partially recover top eigenspaces from adversarially corrupted sparse matrices under mild delocalization constraints. (iii) A rounding algorithm to turn vector assignments of vertices into a community assignment, inspired by the algorithm of Charikar \& Wirth \cite{CW04} for

XOR.

Paper Structure (26 sections, 29 theorems, 67 equations, 2 algorithms)

This paper contains 26 sections, 29 theorems, 67 equations, 2 algorithms.

Introduction
Related work
Organization
Technical overview
Outlier eigenvectors for the Bethe Hessian
Constructing the outlier eigenspace
Robust PCA for sparse matrices
Rounding to communities
Preliminaries
Recovery algorithm
Analysis of algorithm
The number of outlier eigenvalues
Lower bound on the number of outlier eigenvalues
Outlier eigenspace after degree truncation
Robust recovery of a subspace
...and 11 more sections

Key Result

Theorem 1.2

Let $(\mathrm{M}, \pi, d)$ be SBM parameters such that $d$ is above the KS threshold, and let $\boldsymbol{G},{\boldsymbol{x}}\sim\mathrm{SBM}_n(\mathrm{M}, \pi, d)$. There exists $\delta = \delta(\mathrm{M}, \pi, d) > 0$ such that the following holds. There is a polynomial time algorithm that takes

Theorems & Definitions (60)

Definition 1.1: Informal
Theorem 1.2: Informal statement of main theorem
Remark 1.3: Robustness against node corruptions
Proposition 2.1: Bethe Hessian spectrum
Proposition 2.2
Remark 2.3
Lemma 3.3
proof
Remark 4.2
Remark 4.3
...and 50 more

Robust recovery for stochastic block models, simplified and generalized

TL;DR

Abstract

Robust recovery for stochastic block models, simplified and generalized

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (60)