Table of Contents
Fetching ...

A spectral inference method for determining the number of communities in networks

Yujia Wu, Xiucai Ding, Jingfei Zhang, Wei Lan, Chih-Ling Tsai

TL;DR

This paper proposes a model-free spectral inference method based on eigengap ratios that is straightforward to compute, requires no parameter tuning, and can be applied to a wide range of block models without the need to estimate network distribution parameters.

Abstract

To characterize the community structure in network data, researchers have developed various block-type models, including the stochastic block model, the degree-corrected stochastic block model, the mixed membership block model, the degree-corrected mixed membership block model, and others. A critical step in applying these models effectively is determining the number of communities in the network. However, to the best of our knowledge, existing methods for estimating the number of network communities either rely on explicit model fitting or fail to simultaneously accommodate network sparsity and a diverging number of communities. In this paper, we propose a model-free spectral inference method based on eigengap ratios that addresses these challenges. The inference procedure is straightforward to compute, requires no parameter tuning, and can be applied to a wide range of block models without the need to estimate network distribution parameters. Furthermore, it is effective for both dense and sparse networks with a divergent number of communities. Technically, we show that the proposed spectral test statistic converges to a {function of the type-I Tracy-Widom distribution via the Airy kernel} under the null hypothesis, and that the test is asymptotically powerful under weak alternatives. Simulation studies on both dense and sparse networks demonstrate the efficacy of the proposed method. Three real-world examples are presented to illustrate the usefulness of the proposed test.

A spectral inference method for determining the number of communities in networks

TL;DR

This paper proposes a model-free spectral inference method based on eigengap ratios that is straightforward to compute, requires no parameter tuning, and can be applied to a wide range of block models without the need to estimate network distribution parameters.

Abstract

To characterize the community structure in network data, researchers have developed various block-type models, including the stochastic block model, the degree-corrected stochastic block model, the mixed membership block model, the degree-corrected mixed membership block model, and others. A critical step in applying these models effectively is determining the number of communities in the network. However, to the best of our knowledge, existing methods for estimating the number of network communities either rely on explicit model fitting or fail to simultaneously accommodate network sparsity and a diverging number of communities. In this paper, we propose a model-free spectral inference method based on eigengap ratios that addresses these challenges. The inference procedure is straightforward to compute, requires no parameter tuning, and can be applied to a wide range of block models without the need to estimate network distribution parameters. Furthermore, it is effective for both dense and sparse networks with a divergent number of communities. Technically, we show that the proposed spectral test statistic converges to a {function of the type-I Tracy-Widom distribution via the Airy kernel} under the null hypothesis, and that the test is asymptotically powerful under weak alternatives. Simulation studies on both dense and sparse networks demonstrate the efficacy of the proposed method. Three real-world examples are presented to illustrate the usefulness of the proposed test.
Paper Structure (25 sections, 11 theorems, 81 equations, 2 figures, 10 tables, 2 algorithms)

This paper contains 25 sections, 11 theorems, 81 equations, 2 figures, 10 tables, 2 algorithms.

Key Result

Theorem 1

Assume that $n^{1/3}\max_{i, j}P_{ij}/\mathsf{K}^2\to\infty$ as $n\to\infty$, and $\min_{1\le k\le \mathsf{K}}|\lambda_{k}(P)|\ge cn\max_{i, j}P_{ij}/\mathsf{K}$ for some constant $c>0$. Then under the null hypothesis $\mathbf{H}_0$ in (test-hypo), when $\mathsf{K}_{\max}-\mathsf{K}_0$ is finite, we where $W$ is an GOE matrix.

Figures (2)

  • Figure S.1: Computation time (in seconds) for dense and sparse SBMs. Networks have equally sized communities with $n \in \{3,000, 6,000, 9,000\}$, $\mathsf{K}_0 =5$, and $\mathsf{C}=2, 5, 10$.
  • Figure S.2: Feasible region for $n^{1/3}\max_{i, j}P_{ij} / \mathsf{K}^2 =3n^{\epsilon_1}$, where $n=3,000$ and $\epsilon_1=0.005$. The shaded blue area represents the feasible values of ${\mathsf{K}}$ corresponding to each $\max_{i, j}P_{ij}$, while the solid blue line shows the maximal value that ${\mathsf{K}}$ can take.

Theorems & Definitions (25)

  • Remark 1
  • Theorem 1
  • Remark 2
  • Remark 3
  • Theorem 2
  • Corollary 1
  • Corollary 2
  • proof : Proof of Theorem \ref{['theory-null']}
  • Lemma S.1
  • proof : Proof
  • ...and 15 more