Table of Contents
Fetching ...

Consistent model selection in a collection of stochastic block models

Lucie Arts

TL;DR

The consistency of the penalized Krichevsky-Trofimov estimator is established, showing that it converges to the correct number of clusters in both types of models when the number of nodes in the networks increases.

Abstract

We introduce the penalized Krichevsky-Trofimov (KT) estimator as a convergent method for estimating the number of nodes clusters when observing multiple networks within both multi-layer and dynamic Stochastic Block Models. We establish the consistency of the KT estimator, showing that it converges to the correct number of clusters in both types of models when the number of nodes in the networks increases. Our estimator does not require a known upper bound on this number to be consistent. Furthermore, we show that these consistency results hold in both dense and sparse regimes, making the penalized KT estimator robust across various network configurations. We illustrate its performance on synthetic datasets.

Consistent model selection in a collection of stochastic block models

TL;DR

The consistency of the penalized Krichevsky-Trofimov estimator is established, showing that it converges to the correct number of clusters in both types of models when the number of nodes in the networks increases.

Abstract

We introduce the penalized Krichevsky-Trofimov (KT) estimator as a convergent method for estimating the number of nodes clusters when observing multiple networks within both multi-layer and dynamic Stochastic Block Models. We establish the consistency of the KT estimator, showing that it converges to the correct number of clusters in both types of models when the number of nodes in the networks increases. Our estimator does not require a known upper bound on this number to be consistent. Furthermore, we show that these consistency results hold in both dense and sparse regimes, making the penalized KT estimator robust across various network configurations. We illustrate its performance on synthetic datasets.

Paper Structure

This paper contains 26 sections, 13 theorems, 131 equations, 5 figures, 1 table.

Key Result

Theorem 1

Consider the multi-layer SBM with $T$ layers and $k_{0}$ communities (resp. the dynamic SBM with $T$ times point and $k_{0}$ communities). Let $\hat{k}$ be defined as in KTdef (resp. KTdef_dyn). Then, under both sparse (with $n\rho_n^t = \Omega(\log n)$ for all $1 \leq t \leq T$) and dense regimes, eventually almost surely as $n \rightarrow \infty$, with $T$ remaining fixed.

Figures (5)

  • Figure 1: Matrix $P^{0,t}$ where $u_{1}$, $u_{2}$, $u_{3}$, and $u_{4}$$\overset{\text{i.i.d.}}{\sim}$$\mathcal{U}(0.6,1)$.
  • Figure 2: Comparaison of the accuracy of different methods: the penalized Krichevsky-Trofimov estimator (KT), the penalized maximum likelihood (PML), the Bethe-Hessian matrix with moment correction (BHMC) and the network cross-validation method (NCV).
  • Figure 3: Illustration of convergence rates of the penalized KT-estimator for different number of layers.
  • Figure 4: computation time of the different methods
  • Figure 5: Clustering accuracy of the MLSBM estimator when data are generated from a DynSBM, as a function of the number of nodes.

Theorems & Definitions (25)

  • Definition 1: Order of the model
  • Remark 1
  • Definition 2: Penalized KT estimator for MLSBM
  • Definition 3: Penalized KT estimator for DynSBM
  • Remark 2
  • Remark 3
  • Theorem 1
  • Remark 4
  • Remark 5
  • Proposition 1
  • ...and 15 more