Table of Contents
Fetching ...

An Overview of Asymptotic Normality in Stochastic Blockmodels: Cluster Analysis and Inference

Joshua Agterberg, Joshua Cape

TL;DR

This survey systematically organizes the landscape of asymptotic normality results for stochastic blockmodels and their variants, linking classical statistical ideas to modern spectral-network methods. It enumerates rowwise and latent-space central limit theorems for adjacency and Laplacian embeddings, outlines population vs empirical behavior, and connects these limits to parameter estimation and hypothesis testing. By detailing how sparsity, block separation, and low-rank structure shape Gaussian limits, the paper provides a unified framework for inference on network data and highlights remaining open problems in sparse, heterogeneous, and multi-graph settings. The results have practical impact on constructing confidence regions, validating community structure, and performing robust two-graph and multi-graph comparisons in network science applications.

Abstract

This paper provides a selective review of the statistical network analysis literature focused on clustering and inference problems for stochastic blockmodels and their variants. We survey asymptotic normality results for stochastic blockmodels as a means of thematically linking classical statistical concepts to contemporary research in network data analysis. Of note, multiple different forms of asymptotically Gaussian behavior arise in stochastic blockmodels and are useful for different purposes, pertaining to estimation and testing, the characterization of cluster structure in community detection, and understanding latent space geometry. This paper concludes with a discussion of open problems and ongoing research activities addressing asymptotic normality and its implications for statistical network modeling.

An Overview of Asymptotic Normality in Stochastic Blockmodels: Cluster Analysis and Inference

TL;DR

This survey systematically organizes the landscape of asymptotic normality results for stochastic blockmodels and their variants, linking classical statistical ideas to modern spectral-network methods. It enumerates rowwise and latent-space central limit theorems for adjacency and Laplacian embeddings, outlines population vs empirical behavior, and connects these limits to parameter estimation and hypothesis testing. By detailing how sparsity, block separation, and low-rank structure shape Gaussian limits, the paper provides a unified framework for inference on network data and highlights remaining open problems in sparse, heterogeneous, and multi-graph settings. The results have practical impact on constructing confidence regions, validating community structure, and performing robust two-graph and multi-graph comparisons in network science applications.

Abstract

This paper provides a selective review of the statistical network analysis literature focused on clustering and inference problems for stochastic blockmodels and their variants. We survey asymptotic normality results for stochastic blockmodels as a means of thematically linking classical statistical concepts to contemporary research in network data analysis. Of note, multiple different forms of asymptotically Gaussian behavior arise in stochastic blockmodels and are useful for different purposes, pertaining to estimation and testing, the characterization of cluster structure in community detection, and understanding latent space geometry. This paper concludes with a discussion of open problems and ongoing research activities addressing asymptotic normality and its implications for statistical network modeling.
Paper Structure (23 sections, 33 theorems, 128 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 23 sections, 33 theorems, 128 equations, 2 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

Suppose $B \in [0,1]^{K \times K}$ is rank $K$, and let $U$ be the leading $K$ eigenvectors of $P = \rho_{n} Z B Z^{\top}$. There exists a $K \times K$ matrix $R$ such that $U = Z R$. Furthermore, in Euclidean norm, $\| R_{k*} - R_{l*}\| = \sqrt{n_{k}^{-1} + n_{l}^{-1}}$ for all $k \neq l$.

Figures (2)

  • Figure 1: Scatter plot for the adjacency matrix eigenvector components of a SBM with two communities and $n = 10^{4}$ nodes. See \ref{['sec:community_detection']} for details.
  • Figure 2: Comparison of the first $n/2$ rows of $n(\widehat{U} W - U)$ for two different $B$ matrices with the same eigenvectors. The condition numbers of the asymptotic covariance matrices are $10.36$ and $1236.59$, respectively.

Theorems & Definitions (42)

  • Definition 1: Stochastic blockmodel --- undirected, hollow
  • Remark 1: Presence or absence of self-edges
  • Remark 2: Fixed community memberships and the expected adjacency matrix
  • Definition 2: Mixed-membership stochastic blockmodel --- undirected, loopy
  • Definition 3: Degree-corrected stochastic blockmodel --- undirected, hollow
  • Lemma 1: Restatement of Lemma 2.1 of lei_consistency_2015
  • Lemma 2: Restatement of Lemma 2 of zhang_randomized_2022
  • Lemma 3: Lemma 3.1 of rohe_spectral_2011
  • Theorem 1: Specialization of Theorem 4.2.1 in chen_spectral_2021 for SBMs
  • Theorem 2: Restatement of Theorem 4.4 in xie_entrywise_2024 for positive semidefinite SBMs
  • ...and 32 more