Table of Contents
Fetching ...

Network Goodness-of-Fit for the block-model family

Jiashun Jin, Zheng Tracy Ke, Jiajun Tang, Jingming Wang

TL;DR

GoF-MSCORE is proposed as a new Goodness-of-Fit (GoF) metric for DCMM (the broadest one among the four), with two main ideas to use cycle count statistics as a general recipe for GoF.

Abstract

The block-model family has four popular network models (SBM, DCBM, MMSBM, and DCMM). A fundamental problem is, how well each of these models fits with real networks. We propose GoF-MSCORE as a new Goodness-of-Fit (GoF) metric for DCMM (the broadest one among the four), with two main ideas. The first is to use cycle count statistics as a general recipe for GoF. The second is a novel network fitting scheme. GoF-MSCORE is a flexible GoF approach, and we further extend it to SBM, DCBM, and MMSBM. This gives rise to a series of GoF metrics covering each of the four models in the block-model family. We show that for each of the four models, if the assumed model is correct, then the corresponding GoF metric converges to $N(0, 1)$ as the network sizes diverge. We also analyze the powers and show that these metrics are optimal in many settings. In comparison, many other GoF ideas face challenges: they may lack a parameter-free limiting null, or are non-optimal in power, or face an analytical hurdle. Note that a parameter-free limiting null is especially desirable as many network models have a large number of unknown parameters. The limiting nulls of our GoF metrics are always $N(0,1)$, which are parameter-free as desired. For 12 frequently-used real networks, we use the proposed GoF metrics to show that DCMM fits well with almost all of them. We also show that SBM, DCBM, and MMSBM do not fit well with many of these networks, especially when the networks are relatively large. To complement with our study on GoF, we also show that the DCMM is nearly as broad as the rank-$K$ network model. Based on these results, we recommend the DCMM as a promising model for undirected networks.

Network Goodness-of-Fit for the block-model family

TL;DR

GoF-MSCORE is proposed as a new Goodness-of-Fit (GoF) metric for DCMM (the broadest one among the four), with two main ideas to use cycle count statistics as a general recipe for GoF.

Abstract

The block-model family has four popular network models (SBM, DCBM, MMSBM, and DCMM). A fundamental problem is, how well each of these models fits with real networks. We propose GoF-MSCORE as a new Goodness-of-Fit (GoF) metric for DCMM (the broadest one among the four), with two main ideas. The first is to use cycle count statistics as a general recipe for GoF. The second is a novel network fitting scheme. GoF-MSCORE is a flexible GoF approach, and we further extend it to SBM, DCBM, and MMSBM. This gives rise to a series of GoF metrics covering each of the four models in the block-model family. We show that for each of the four models, if the assumed model is correct, then the corresponding GoF metric converges to as the network sizes diverge. We also analyze the powers and show that these metrics are optimal in many settings. In comparison, many other GoF ideas face challenges: they may lack a parameter-free limiting null, or are non-optimal in power, or face an analytical hurdle. Note that a parameter-free limiting null is especially desirable as many network models have a large number of unknown parameters. The limiting nulls of our GoF metrics are always , which are parameter-free as desired. For 12 frequently-used real networks, we use the proposed GoF metrics to show that DCMM fits well with almost all of them. We also show that SBM, DCBM, and MMSBM do not fit well with many of these networks, especially when the networks are relatively large. To complement with our study on GoF, we also show that the DCMM is nearly as broad as the rank- network model. Based on these results, we recommend the DCMM as a promising model for undirected networks.

Paper Structure

This paper contains 60 sections, 34 theorems, 399 equations, 6 figures, 4 tables, 4 algorithms.

Key Result

Theorem 1.1

First, $\tau_1 = 1$. Second, if $K = 2$, or if $K \geq 3$ and the above conditions hold, then there are matrices $(\Theta, \Pi, P)$ as in the DCMM such that $\Omega = \Theta \Pi P \Pi' \Theta$.

Figures (6)

  • Figure A1: Left: Comparison of models (Section \ref{['subsec:model']}). Right: Histograms of four GoF metrics (based on 1000 networks generated from DCMM; black curve: density of $N(0,1)$; see Section \ref{['subsec:contribution']}). The results suggest that DCMM fits well with the networks, but SBM, DCBM, and MMSBM do not.
  • Figure A2: GoF-MSCORE for DCMM (flow chart).
  • Figure A3: Estimated weight $\hat{w}$ in NC for some authors ($(1 - \hat{w})$ is the weight in CH).
  • Figure A4: Histograms of the GoF metrics in Experiment 1.1-1.4, where the off-diagonal entries of $P$ are equal to $0.05$. In each panel, the networks are generated from a true model in the block-model family, the four colors correspond to four GoF metrics, and the black curve is the density of $N(0,1)$.
  • Figure A5: Histograms of the GoF metrics in Experiment 1.1-1.4, where the off-diagonal entries of $P$ are equal to $0.2$. In each panel, the networks are generated from a true model in the block-model family, the four colors correspond to four GoF metrics, and the black curve is the density of $N(0,1)$.
  • ...and 1 more figures

Theorems & Definitions (48)

  • Definition 1.1
  • Theorem 1.1: NMF
  • Definition 2.1
  • Theorem 2.1: Parameter-free limiting null (oracle case)
  • Corollary 2.1
  • Lemma 2.1
  • Lemma 3.1: The population quantities
  • Lemma 3.2
  • Definition 3.1
  • Theorem 3.1: MSCORE and net-rounding
  • ...and 38 more