Table of Contents
Fetching ...

Monte Carlo goodness-of-fit tests for degree corrected and related stochastic blockmodels

Vishesh Karwa, Debdeep Pati, Sonja Petrović, Liam Solus, Nikita Alexeev, Mateja Raič, Dane Wilburne, Robert Williams, Bowei Yan

TL;DR

The general testing methodology developed here extends to any finite mixture of log-linear models on discrete data, and as such is the first application of the algebraic statistics machinery for latent-variable models.

Abstract

We construct Bayesian and frequentist finite-sample goodness-of-fit tests for three different variants of the stochastic blockmodel for network data. Since all of the stochastic blockmodel variants are log-linear in form when block assignments are known, the tests for the \emph{latent} block model versions combine a block membership estimator with the algebraic statistics machinery for testing goodness-of-fit in log-linear models. We describe Markov bases and marginal polytopes of the variants of the stochastic blockmodel, and discuss how both facilitate the development of goodness-of-fit tests and understanding of model behavior. The general testing methodology developed here extends to any finite mixture of log-linear models on discrete data, and as such is the first application of the algebraic statistics machinery for latent-variable models.

Monte Carlo goodness-of-fit tests for degree corrected and related stochastic blockmodels

TL;DR

The general testing methodology developed here extends to any finite mixture of log-linear models on discrete data, and as such is the first application of the algebraic statistics machinery for latent-variable models.

Abstract

We construct Bayesian and frequentist finite-sample goodness-of-fit tests for three different variants of the stochastic blockmodel for network data. Since all of the stochastic blockmodel variants are log-linear in form when block assignments are known, the tests for the \emph{latent} block model versions combine a block membership estimator with the algebraic statistics machinery for testing goodness-of-fit in log-linear models. We describe Markov bases and marginal polytopes of the variants of the stochastic blockmodel, and discuss how both facilitate the development of goodness-of-fit tests and understanding of model behavior. The general testing methodology developed here extends to any finite mixture of log-linear models on discrete data, and as such is the first application of the algebraic statistics machinery for latent-variable models.

Paper Structure

This paper contains 32 sections, 7 theorems, 35 equations, 10 figures, 7 tables, 2 algorithms.

Key Result

Proposition 3.3

Consider an exponential family model as in Definition defn:fiber. Let $u$ denote a fixed value of the sufficient statistics. Then the conditional distribution on the fiber is

Figures (10)

  • Figure 1: Two graphs $g$ and $h$ in the same fiber of the ER-SBM with block structure $\mathbb{B}=\{B_1, B_2, B_3\}$ and a Markov move corresponding to the linear form $x_{23}-x_{15}\in\ker(\varphi_{\mathrm{ER}})$ that moves from $g$ to $h$. The blue, loosely dashed line indicates edge insertion and the red, densely dashed line indicates edge deletion.
  • Figure 2: Two graphs $g$ and $h$ in the same fiber of the additive SBM with block structure $\mathbb{B}=\{B_1,B_2,B_3\}$ and a quadratic Markov move corresponding to the binomial $x_{26}x_{45}-x_{24}x_{56}\in\ker(\varphi_{\mathrm{Add}})$ that moves from $g$ to $h$. The blue, loosely dashed lines indicate edge insertion and the red, densely dashed lines indicate edge deletion.
  • Figure 3: Two graphs $g$ and $h$ in the same fiber of the $\beta$-SBM with block structure $\mathbb{B}=\{B_1,B_2,B_3\}$ and a cubic Markov move corresponding to the binomial $x_{23}x_{45}x_{46}-x_{24}x_{34}x_{56}\in\ker(\varphi_\beta)$ that moves from $g$ to $h$. The blue, loosely dashed lines indicate edge insertion and the red, densely dashed lines indicate edge deletion.
  • Figure 4: Histograms of the GoF statistic for testing the fit of ER-SBM with $2$ blocks. Data generated from a $2$-block additive SBM.
  • Figure 5: Histograms of the GoF statistic for testing the fit of a $2$-block ER-SBM. Data generated from a $2$-block $\beta$-SBM.
  • ...and 5 more figures

Theorems & Definitions (23)

  • Definition 2.1: ER-SBM
  • Definition 2.2: Additive SBM
  • Remark 2.3
  • Definition 2.4: $\beta$-SBM
  • Remark 2.5
  • Definition 3.1: A fiber for a discrete exponential family
  • Remark 3.2
  • Proposition 3.3
  • proof
  • Proposition 3.4
  • ...and 13 more