Table of Contents
Fetching ...

Mesoscale two-sample testing for networks

Peter W. MacDonald, Elizaveta Levina, Ji Zhu

Abstract

Networks arise naturally in many scientific fields as a representation of pairwise connections. Statistical network analysis has most often considered a single large network, but it is common in a number of applications to observe multiple networks on a shared node set. When these networks are grouped by case-control status or another categorical covariate, the classical statistical question of two-sample comparison arises. In this work, we address the problem of testing for statistically significant differences in a given arbitrary subset of connections. This general framework allows an analyst to focus on a single node, a specific region of interest, or compare whole networks. Our ability to conduct ``mesoscale'' testing on a meaningful group of edges is particularly relevant for applications such as neuroimaging and distinguishes our approach from prior work, which tends to focus either on a single node or the whole network. In this mesoscale setting, we develop statistically sound projection-based tests for two-sample comparison in both weighted and binary edge networks. The key to our approach is to leverage network information from outside the set of interest to learn informative low-rank projections which leads to more powerful tests.

Mesoscale two-sample testing for networks

Abstract

Networks arise naturally in many scientific fields as a representation of pairwise connections. Statistical network analysis has most often considered a single large network, but it is common in a number of applications to observe multiple networks on a shared node set. When these networks are grouped by case-control status or another categorical covariate, the classical statistical question of two-sample comparison arises. In this work, we address the problem of testing for statistically significant differences in a given arbitrary subset of connections. This general framework allows an analyst to focus on a single node, a specific region of interest, or compare whole networks. Our ability to conduct ``mesoscale'' testing on a meaningful group of edges is particularly relevant for applications such as neuroimaging and distinguishes our approach from prior work, which tends to focus either on a single node or the whole network. In this mesoscale setting, we develop statistically sound projection-based tests for two-sample comparison in both weighted and binary edge networks. The key to our approach is to leverage network information from outside the set of interest to learn informative low-rank projections which leads to more powerful tests.

Paper Structure

This paper contains 42 sections, 16 theorems, 225 equations, 9 figures, 3 tables.

Key Result

Proposition 1

Suppose $U_{-[r]}$ and $V_{-[c]}$ have linearly independent columns. Then $\mathbb{T}$ has rank $d_{\mathcal{S}}$, and the column and row spaces of $\mathbb{T}$ coincide with the column and row spaces of $\Theta^{(1)}_{\mathcal{S}} - \Theta^{(2)}_{\mathcal{S}}$.

Figures (9)

  • Figure 1: Rejection rate for Gaussian edge networks with inner product latent space model. Dashed lines correspond to $\alpha=0.05$ and $1$. Point colors correspond to $2p$ or $d$, with basic tests displayed in black and oracle projection tests in orange.
  • Figure 2: Rejection rate for Gaussian edge networks with Euclidean distance latent space model. Dashed lines correspond to $\alpha=0.05$ and $1$. Point colors correspond to $2p$ or $d$, with basic tests displayed in black and oracle projection tests in orange.
  • Figure 3: Rejection rates for binary networks modeled with logistic link and the inner product latent space model. Dashed lines correspond to $\alpha=0.05$ and $1$. Point colors correspond to $2p$ or $d$, with basic tests displayed in black and oracle projection tests in orange.
  • Figure 4: Rejection rates for binary edge logistic link networks with inner product latent space model and overdispersion parameter $\eta=2$. Dashed lines correspond to $\alpha=0.05$ and $1$. Point colors correspond to $2p$ or $d$, with basic tests displayed in black and oracle projection tests in orange.
  • Figure 5: Median $-\log(p)$ for testing the CBM/CBM hypothesis in fRMI data as a function of sample size $m$. For $m < 20$, vertical bars span from the 25th to 75th empirical quantiles over $200$ replications. The horizontal dotted line corresponds to $\alpha=0.05$.
  • ...and 4 more figures

Theorems & Definitions (36)

  • Remark 1
  • Remark 2
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Remark 3
  • Lemma 1
  • Corollary 1
  • Proposition 5
  • ...and 26 more