Efficient Graph Matching for Correlated Stochastic Block Models
Shuwen Chai, Miklós Z. Rácz
TL;DR
The paper addresses the problem of efficient graph matching for correlated stochastic block models with two balanced communities in the sparse logarithmic-degree regime. It introduces chandelier-based signature counts and a color-coding scheme, combining almost-exact matching with seeded graph matching to achieve exact recovery when information-theoretically possible. The authors prove rigorous mean and variance bounds for the chandelier-based similarity scores under two regimes determined by the Chernoff–Hellinger divergence, and they extend these results to exact community recovery using multiple correlated graphs. The work extends prior chandelier results from Erdős–Rényi graphs to SBMs, handles estimation errors in community labels, and offers practical implications for accurate community recovery in multi-graph settings, contributing to a clearer understanding of the computational limits and possible algorithms for graph matching in structured networks.
Abstract
We study learning problems on correlated stochastic block models with two balanced communities. Our main result gives the first efficient algorithm for graph matching in this setting. In the most interesting regime where the average degree is logarithmic in the number of vertices, this algorithm correctly matches all but a vanishing fraction of vertices with high probability, whenever the edge correlation parameter $s$ satisfies $s^2 > α\approx 0.338$, where $α$ is Otter's tree-counting constant. Moreover, we extend this to an efficient algorithm for exact graph matching whenever this is information-theoretically possible, positively resolving an open problem of Rácz and Sridhar (NeurIPS 2021). Our algorithm generalizes the recent breakthrough work of Mao, Wu, Xu, and Yu (STOC 2023), which is based on centered subgraph counts of a large family of trees termed chandeliers. A major technical challenge that we overcome is dealing with the additional estimation errors that are necessarily present due to the fact that, in relevant parameter regimes, the latent community partition cannot be exactly recovered from a single graph. As an application of our results, we give an efficient algorithm for exact community recovery using multiple correlated graphs in parameter regimes where it is information-theoretically impossible to do so using just a single graph.
