Table of Contents
Fetching ...

The Power of Two Matrices in Spectral Algorithms for Community Recovery

Souvik Dhara, Julia Gaudio, Elchanan Mossel, Colin Sandon

TL;DR

This paper analyzes a two-matrix spectral algorithm for the problem of identifying latent community structure in large random graphs and shows that spectral algorithms based on two matrices are optimal and succeed in recovering communities up to the information theoretic threshold.

Abstract

Spectral algorithms are some of the main tools in optimization and inference problems on graphs. Typically, the graph is encoded as a matrix and eigenvectors and eigenvalues of the matrix are then used to solve the given graph problem. Spectral algorithms have been successfully used for graph partitioning, hidden clique recovery and graph coloring. In this paper, we study the power of spectral algorithms using two matrices in a graph partitioning problem. We use two different matrices resulting from two different encodings of the same graph and then combine the spectral information coming from these two matrices. We analyze a two-matrix spectral algorithm for the problem of identifying latent community structure in large random graphs. In particular, we consider the problem of recovering community assignments exactly in the censored stochastic block model, where each edge status is revealed independently with some probability. We show that spectral algorithms based on two matrices are optimal and succeed in recovering communities up to the information theoretic threshold. Further, we show that for most choices of the parameters, any spectral algorithm based on one matrix is suboptimal. The latter observation is in contrast to our prior works (2022a, 2022b) which showed that for the symmetric Stochastic Block Model and the Planted Dense Subgraph problem, a spectral algorithm based on one matrix achieves the information theoretic threshold. We additionally provide more general geometric conditions for the (sub)-optimality of spectral algorithms.

The Power of Two Matrices in Spectral Algorithms for Community Recovery

TL;DR

This paper analyzes a two-matrix spectral algorithm for the problem of identifying latent community structure in large random graphs and shows that spectral algorithms based on two matrices are optimal and succeed in recovering communities up to the information theoretic threshold.

Abstract

Spectral algorithms are some of the main tools in optimization and inference problems on graphs. Typically, the graph is encoded as a matrix and eigenvectors and eigenvalues of the matrix are then used to solve the given graph problem. Spectral algorithms have been successfully used for graph partitioning, hidden clique recovery and graph coloring. In this paper, we study the power of spectral algorithms using two matrices in a graph partitioning problem. We use two different matrices resulting from two different encodings of the same graph and then combine the spectral information coming from these two matrices. We analyze a two-matrix spectral algorithm for the problem of identifying latent community structure in large random graphs. In particular, we consider the problem of recovering community assignments exactly in the censored stochastic block model, where each edge status is revealed independently with some probability. We show that spectral algorithms based on two matrices are optimal and succeed in recovering communities up to the information theoretic threshold. Further, we show that for most choices of the parameters, any spectral algorithm based on one matrix is suboptimal. The latter observation is in contrast to our prior works (2022a, 2022b) which showed that for the symmetric Stochastic Block Model and the Planted Dense Subgraph problem, a spectral algorithm based on one matrix achieves the information theoretic threshold. We additionally provide more general geometric conditions for the (sub)-optimality of spectral algorithms.
Paper Structure (25 sections, 25 theorems, 124 equations, 1 figure)

This paper contains 25 sections, 25 theorems, 124 equations, 1 figure.

Key Result

Theorem 1.4

Let $G \sim \textsc{CSBM}_n^k(\rho, P, t)$. If $t<t_c$, then for any estimator $\hat{\sigma}$,

Figures (1)

  • Figure 1: Visualizing dissonance ranges of two communities near $t_c$.

Theorems & Definitions (66)

  • Definition 1.1: Censored Stochastic Block Model (CSBM)
  • Definition 1.2: Exact recovery
  • Definition 1.3: Chernoff--Hellinger divergence
  • Theorem 1.4: Information theoretic threshold
  • Definition 1.5: Signed adjacency matrix
  • Definition 1.6: Spectral-One
  • Theorem 1.7: Failure of Spectral-One in most cases
  • Remark 1.8
  • Definition 1.9: Spectral-Two
  • Theorem 1.10: Spectral-Two always succeeds in recovering two communities
  • ...and 56 more