Table of Contents
Fetching ...

Efficient Graph Matching for Correlated Stochastic Block Models

Shuwen Chai, Miklós Z. Rácz

TL;DR

The paper addresses the problem of efficient graph matching for correlated stochastic block models with two balanced communities in the sparse logarithmic-degree regime. It introduces chandelier-based signature counts and a color-coding scheme, combining almost-exact matching with seeded graph matching to achieve exact recovery when information-theoretically possible. The authors prove rigorous mean and variance bounds for the chandelier-based similarity scores under two regimes determined by the Chernoff–Hellinger divergence, and they extend these results to exact community recovery using multiple correlated graphs. The work extends prior chandelier results from Erdős–Rényi graphs to SBMs, handles estimation errors in community labels, and offers practical implications for accurate community recovery in multi-graph settings, contributing to a clearer understanding of the computational limits and possible algorithms for graph matching in structured networks.

Abstract

We study learning problems on correlated stochastic block models with two balanced communities. Our main result gives the first efficient algorithm for graph matching in this setting. In the most interesting regime where the average degree is logarithmic in the number of vertices, this algorithm correctly matches all but a vanishing fraction of vertices with high probability, whenever the edge correlation parameter $s$ satisfies $s^2 > α\approx 0.338$, where $α$ is Otter's tree-counting constant. Moreover, we extend this to an efficient algorithm for exact graph matching whenever this is information-theoretically possible, positively resolving an open problem of Rácz and Sridhar (NeurIPS 2021). Our algorithm generalizes the recent breakthrough work of Mao, Wu, Xu, and Yu (STOC 2023), which is based on centered subgraph counts of a large family of trees termed chandeliers. A major technical challenge that we overcome is dealing with the additional estimation errors that are necessarily present due to the fact that, in relevant parameter regimes, the latent community partition cannot be exactly recovered from a single graph. As an application of our results, we give an efficient algorithm for exact community recovery using multiple correlated graphs in parameter regimes where it is information-theoretically impossible to do so using just a single graph.

Efficient Graph Matching for Correlated Stochastic Block Models

TL;DR

The paper addresses the problem of efficient graph matching for correlated stochastic block models with two balanced communities in the sparse logarithmic-degree regime. It introduces chandelier-based signature counts and a color-coding scheme, combining almost-exact matching with seeded graph matching to achieve exact recovery when information-theoretically possible. The authors prove rigorous mean and variance bounds for the chandelier-based similarity scores under two regimes determined by the Chernoff–Hellinger divergence, and they extend these results to exact community recovery using multiple correlated graphs. The work extends prior chandelier results from Erdős–Rényi graphs to SBMs, handles estimation errors in community labels, and offers practical implications for accurate community recovery in multi-graph settings, contributing to a clearer understanding of the computational limits and possible algorithms for graph matching in structured networks.

Abstract

We study learning problems on correlated stochastic block models with two balanced communities. Our main result gives the first efficient algorithm for graph matching in this setting. In the most interesting regime where the average degree is logarithmic in the number of vertices, this algorithm correctly matches all but a vanishing fraction of vertices with high probability, whenever the edge correlation parameter satisfies , where is Otter's tree-counting constant. Moreover, we extend this to an efficient algorithm for exact graph matching whenever this is information-theoretically possible, positively resolving an open problem of Rácz and Sridhar (NeurIPS 2021). Our algorithm generalizes the recent breakthrough work of Mao, Wu, Xu, and Yu (STOC 2023), which is based on centered subgraph counts of a large family of trees termed chandeliers. A major technical challenge that we overcome is dealing with the additional estimation errors that are necessarily present due to the fact that, in relevant parameter regimes, the latent community partition cannot be exactly recovered from a single graph. As an application of our results, we give an efficient algorithm for exact community recovery using multiple correlated graphs in parameter regimes where it is information-theoretically impossible to do so using just a single graph.

Paper Structure

This paper contains 56 sections, 40 theorems, 257 equations, 8 figures, 4 algorithms.

Key Result

Theorem 1.1

Fix constants $a \neq b > 0$ and $s \in [0,1]$. Let $(G_1, G_2)\sim \mathrm{CSBM}(n,a\frac{\log n}{n},b\frac{\log n}{n},s)$. For any $\varepsilon > 0$, if $s^{2} \geq \alpha + \varepsilon$, then the following holds.

Figures (8)

  • Figure 1: Schematic illustrating two-community correlated SBMs; see the text for details. (Figure reproduced from RS21 with permission.)
  • Figure 2: Phase diagram for graph matching on $(G_1,G_2) \sim \mathrm{CSBM}(n, \frac{a\log n}{n}, \frac{b\log n}{n}, s)$. The red diagonal line depicts $a=b$, which is an Erdős--Rényi graph. Black regions: exact graph matching is possible and can be done efficiently for each community separately by applying the graph matching algorithm for correlated Erdős--Rényi graphs; Green regions: exact graph matching is possible and can be done efficiently; Light green regions: exact graph matching is impossible, but almost exact graph matching is possible and can be done efficiently; Cyan regions: exact graph matching is possible and can be done efficiently by first recovering the community labels almost exactly; Yellow regions: exact graph matching is impossible but almost exact graph matching can be done efficiently by first recovering the community labels almost exactly.
  • Figure 3: Phase diagram for exact community recovery with fixed $s$ on correlated SBMs. Green regions: exact community recovery is possible from $G_1$ alone and can be done efficiently; Lightgreen regions: exact community recovery is possible from $(G_1, G_2)$ but impossible from $G_1$ alone, exact graph matching can be done efficiently and therefore exact community recovery can be done efficiently; Violet regions: exact community recovery is impossible from $G_1$ alone, impossible from $(G_1,G_2)$ if $s^2(\frac{a+b}{2})+s(1-s)D_{+}(a,b)<1$ and possible if $s^2(\frac{a+b}{2})+s(1-s)D_{+}(a,b)>1$GRS22. It is unknown whether there exists an efficient algorithm for exact community recovery in this regime.
  • Figure 4: A chandelier.
  • Figure 5: Decomposition of a decorated tree into three sequences of trees. Edges that are 2, 3, $4$-decorated are painted as red, green, and blue color correspondingly. Roots of each subtree is marked by larger node and annotated as $r(T^{(i)}_j)$, where $i$ is the decoration counts and $j$ is the order in its sequence.
  • ...and 3 more figures

Theorems & Definitions (92)

  • Theorem 1.1
  • Theorem 1.2
  • Remark 1
  • Definition 2.1: $(L, M, K, R)$--chandeliermao2022chandelier
  • Definition 2.2: $(L, M, K, R, D)$--chandelier
  • Lemma 2.3
  • proof
  • Lemma 2.4
  • proof
  • Theorem 2.5
  • ...and 82 more