Table of Contents
Fetching ...

On inference for modularity statistics in structured networks

Anirban Mitra, Konasale Prasad, Joshua Cape

TL;DR

This article formulate and study several modularity statistic variants for which asymptotic distributional results in the large-network limit for networks exhibiting nodal community structure are established and can be used in conjunction with existing theoretical guarantees for stochastic blockmodel random graphs.

Abstract

This paper revisits the classical concept of network modularity and its spectral relaxations used throughout graph data analysis. We formulate and study several modularity statistic variants for which we establish asymptotic distributional results in the large-network limit for networks exhibiting nodal community structure. Our work facilitates testing for network differences and can be used in conjunction with existing theoretical guarantees for stochastic blockmodel random graphs. Our results are enabled by recent advances in the study of low-rank truncations of large network adjacency matrices. We provide confirmatory simulation studies and real data analysis pertaining to the network neuroscience study of psychosis, specifically schizophrenia. Collectively, this paper contributes to the limited existing literature to date on statistical inference for modularity-based network analysis. Supplemental materials for this article are available online.

On inference for modularity statistics in structured networks

TL;DR

This article formulate and study several modularity statistic variants for which asymptotic distributional results in the large-network limit for networks exhibiting nodal community structure are established and can be used in conjunction with existing theoretical guarantees for stochastic blockmodel random graphs.

Abstract

This paper revisits the classical concept of network modularity and its spectral relaxations used throughout graph data analysis. We formulate and study several modularity statistic variants for which we establish asymptotic distributional results in the large-network limit for networks exhibiting nodal community structure. Our work facilitates testing for network differences and can be used in conjunction with existing theoretical guarantees for stochastic blockmodel random graphs. Our results are enabled by recent advances in the study of low-rank truncations of large network adjacency matrices. We provide confirmatory simulation studies and real data analysis pertaining to the network neuroscience study of psychosis, specifically schizophrenia. Collectively, this paper contributes to the limited existing literature to date on statistical inference for modularity-based network analysis. Supplemental materials for this article are available online.
Paper Structure (28 sections, 6 theorems, 29 equations, 14 figures, 4 tables)

This paper contains 28 sections, 6 theorems, 29 equations, 14 figures, 4 tables.

Key Result

Theorem 3.1

For $n \ge 1$, let $\mathbf{A}^{(n)} \sim \operatorname{SBM}(\mathbf{B}, \boldsymbol{\pi})$ be a sequence of stochastic blockmodel graphs with sparsity factor $\rho_{n}$ satisfying $n\rho_{n} = \omega(\log n)$. Then, as $n \rightarrow \infty$, the likelihood variant of modularity in like_mod satisfi The matrix $\mathbf{D}$ is specified in D-form and depends on whether $\rho_{n} \equiv 1$ or $\rho_

Figures (14)

  • Figure 1: Dense networks in \ref{['eg1']} with $n = 300$ nodes. Left plot shows $\rho_{n}^{-1/2}n^{-1}Q_{\operatorname{L}}$, and right plot shows $\rho_{n}^{-1/2}n^{-1}Q_{\operatorname{S}}$. Dashed vertical line shows bias in simulation. Solid vertical line shows population bias. Solid curve shows population density fit.
  • Figure 2: Sparse networks in \ref{['eg1']} for $n \in \{300,600,1800,6000\}$ nodes. Left panel shows $\rho_{n}^{-1/2}n^{-1}Q_{\operatorname{L}}$, and right panel shows $\rho_{n}^{-1/2}n^{-1}Q_{\operatorname{S}}$. Dashed vertical line shows bias in simulation. Solid vertical line shows population bias. Solid curve shows population density fit.
  • Figure 3: Dense networks in \ref{['eg2']} with $n \in \{400,800,1000,4000\}$ nodes. Left panel shows $\rho_{n}^{-1/2}n^{-1}Q_{\operatorname{L}}$, and right panel shows $\rho_{n}^{-1/2}n^{-1}Q_{\operatorname{S}}$. Dashed vertical line shows bias in simulation. Solid vertical line shows population bias. Solid curve shows population density fit.
  • Figure 4: Sparse networks and residual-based modularity in \ref{['eg3']}. Dashed vertical line shows bias in simulation. Solid vertical line shows population bias. Solid curve shows population density fit.
  • Figure 5: Asymptotic bias and variance plotted as functions of $(p,q)$ where $\mathbf{B} = \left[p^{2}pqpqq^{2}\right]$ and $\boldsymbol{\pi} = [1/4, 3/4]^{\top}$.
  • ...and 9 more figures

Theorems & Definitions (10)

  • Definition 2.1: Stochastic blockmodel random graphs
  • Definition 3.1: Likelihood-based, Spectral-based and Residual-based modularities
  • Theorem 3.1: Limiting distribution for likelihood-based modularity
  • Theorem 3.2: Limiting distribution for spectral-based modularity
  • Theorem 3.3: Limiting distribution for residual-based modularity
  • Definition 3.2: Maximum likelihood estimator and spectral estimator
  • Lemma A.1: bickel2013asymptotic
  • Lemma A.2: tang2022asymptotically
  • Lemma A.3: tang2022asymptotically
  • proof : Proofs of \ref{['thrm:mod_like', 'thrm:mod_spec', 'thrm:mod_res']}