Table of Contents
Fetching ...

Semi-Supervised Community Detection via Quasi-Stationary Distributions

Nicolas Fraiman, Michael Nisenzon

TL;DR

This work addresses semi-supervised community detection in a two-community PL-SBM by leveraging quasi-stationary distributions arising from absorbing revealed nodes. The authors develop a family of QSD-based estimators, including a pure QSD, simple voting, and a mixed QSD approach, and prove both minimax lower bounds and upper bounds on error rates, with the mixed method achieving the optimal connected-regime rate. The analysis hinges on a thorough entrywise eigenvector concentration framework, generalized Davis–Kahan perturbation results for submatrices, and careful large-deviation bounds for binomial differences, all tailored to the partially labeled setting. Empirically, QSD-based methods exhibit improvements over classical spectral approaches, especially in the bounded-degree regime, demonstrating the practical impact of integrating side information via absorbing-state random walks into spectral clustering. The work thus extends spectral methodology to quasi-stationary transitions, providing sharp performance guarantees and actionable algorithms for semi-supervised community detection.

Abstract

Spectral clustering is a widely used method for community detection in networks. We focus on a semi-supervised community detection scenario in the Partially Labeled Stochastic Block Model (PL-SBM) with two balanced communities, where a fixed portion of labels is known. Our approach leverages random walks in which the revealed nodes in each community act as absorbing states. By analyzing the quasi-stationary distributions associated with these random walks, we construct a classifier that distinguishes the two communities by examining differences in the associated eigenvectors. We establish upper and lower bounds on the error rate for a broad class of quasi-stationary algorithms, encompassing both spectral and voting-based approaches. In particular, we prove that this class of algorithms can achieve the optimal error rate in the connected regime. We further demonstrate empirically that our quasi-stationary approach improves performance on both real-world and simulated datasets.

Semi-Supervised Community Detection via Quasi-Stationary Distributions

TL;DR

This work addresses semi-supervised community detection in a two-community PL-SBM by leveraging quasi-stationary distributions arising from absorbing revealed nodes. The authors develop a family of QSD-based estimators, including a pure QSD, simple voting, and a mixed QSD approach, and prove both minimax lower bounds and upper bounds on error rates, with the mixed method achieving the optimal connected-regime rate. The analysis hinges on a thorough entrywise eigenvector concentration framework, generalized Davis–Kahan perturbation results for submatrices, and careful large-deviation bounds for binomial differences, all tailored to the partially labeled setting. Empirically, QSD-based methods exhibit improvements over classical spectral approaches, especially in the bounded-degree regime, demonstrating the practical impact of integrating side information via absorbing-state random walks into spectral clustering. The work thus extends spectral methodology to quasi-stationary transitions, providing sharp performance guarantees and actionable algorithms for semi-supervised community detection.

Abstract

Spectral clustering is a widely used method for community detection in networks. We focus on a semi-supervised community detection scenario in the Partially Labeled Stochastic Block Model (PL-SBM) with two balanced communities, where a fixed portion of labels is known. Our approach leverages random walks in which the revealed nodes in each community act as absorbing states. By analyzing the quasi-stationary distributions associated with these random walks, we construct a classifier that distinguishes the two communities by examining differences in the associated eigenvectors. We establish upper and lower bounds on the error rate for a broad class of quasi-stationary algorithms, encompassing both spectral and voting-based approaches. In particular, we prove that this class of algorithms can achieve the optimal error rate in the connected regime. We further demonstrate empirically that our quasi-stationary approach improves performance on both real-world and simulated datasets.

Paper Structure

This paper contains 16 sections, 22 theorems, 135 equations, 3 figures, 1 table.

Key Result

Theorem 1

For equivariant estimators $\hat{\sigma}$ we have

Figures (3)

  • Figure 1: Revealed and unrevealed nodes up to permutation.
  • Figure 2: Submatrices of the transition matrix up to permutation.
  • Figure 3: QSD eigenvectors on connected SBM.

Theorems & Definitions (52)

  • Definition 1: Balanced partitions and partial labels
  • Definition 2: Partially Labeled Balanced SBM
  • Definition 3: Transition submatrices and eigenvectors
  • Definition 4: Mean--field eigenvector
  • Definition 5: Error rate
  • Definition 6: Equivariance
  • Theorem 1
  • proof
  • Definition 7
  • Definition 8
  • ...and 42 more