Table of Contents
Fetching ...

Network two-sample test for block models

Chung Kyong Nguen, Oscar Hernan Madrid Padilla, Arash A. Amini

TL;DR

This work addresses the problem of two-sample testing for networks when there is no vertex correspondence, by modeling networks with SBMs and solving the graph-matching challenge via a spectral matching algorithm. The authors construct SBM-TS, a three-stage testing procedure that aligns estimated block connectivity matrices across samples and forms a global test statistic with asymptotic $\chi^2$ null distribution. They establish matching consistency, a null chi-square limit, and test consistency under alternatives, under mild sparsity and sample-size conditions, and validate the approach on synthetic data (SBMs, RDPG, graphons) and real datasets (COLLAB, SW-GOT). The method is computationally efficient, robust to mislabeling, and adaptable to multiple-sample settings, offering a principled framework for inference on network populations without node alignment. Overall, SBM-TS provides a practical and theoretically grounded tool for robust statistical inference in complex unlabeled network data with broad applicability.

Abstract

We consider the two-sample testing problem for networks, where the goal is to determine whether two sets of networks originated from the same stochastic model. Assuming no vertex correspondence and allowing for different numbers of nodes, we address a fundamental network testing problem that goes beyond simple adjacency matrix comparisons. We adopt the stochastic block model (SBM) for network distributions, due to their interpretability and the potential to approximate more general models. The lack of meaningful node labels and vertex correspondence translate to a graph matching challenge when developing a test for SBMs. We introduce an efficient algorithm to match estimated network parameters, allowing us to properly combine and contrast information within and across samples, leading to a powerful test. We show that the matching algorithm, and the overall test are consistent, under mild conditions on the sparsity of the networks and the sample sizes, and derive a chi-squared asymptotic null distribution for the test. Through a mixture of theoretical insights and empirical validations, including experiments with both synthetic and real-world data, this study advances robust statistical inference for complex network data.

Network two-sample test for block models

TL;DR

This work addresses the problem of two-sample testing for networks when there is no vertex correspondence, by modeling networks with SBMs and solving the graph-matching challenge via a spectral matching algorithm. The authors construct SBM-TS, a three-stage testing procedure that aligns estimated block connectivity matrices across samples and forms a global test statistic with asymptotic null distribution. They establish matching consistency, a null chi-square limit, and test consistency under alternatives, under mild sparsity and sample-size conditions, and validate the approach on synthetic data (SBMs, RDPG, graphons) and real datasets (COLLAB, SW-GOT). The method is computationally efficient, robust to mislabeling, and adaptable to multiple-sample settings, offering a principled framework for inference on network populations without node alignment. Overall, SBM-TS provides a practical and theoretically grounded tool for robust statistical inference in complex unlabeled network data with broad applicability.

Abstract

We consider the two-sample testing problem for networks, where the goal is to determine whether two sets of networks originated from the same stochastic model. Assuming no vertex correspondence and allowing for different numbers of nodes, we address a fundamental network testing problem that goes beyond simple adjacency matrix comparisons. We adopt the stochastic block model (SBM) for network distributions, due to their interpretability and the potential to approximate more general models. The lack of meaningful node labels and vertex correspondence translate to a graph matching challenge when developing a test for SBMs. We introduce an efficient algorithm to match estimated network parameters, allowing us to properly combine and contrast information within and across samples, leading to a powerful test. We show that the matching algorithm, and the overall test are consistent, under mild conditions on the sparsity of the networks and the sample sizes, and derive a chi-squared asymptotic null distribution for the test. Through a mixture of theoretical insights and empirical validations, including experiments with both synthetic and real-world data, this study advances robust statistical inference for complex network data.
Paper Structure (46 sections, 13 theorems, 185 equations, 11 figures, 2 tables, 2 algorithms)

This paper contains 46 sections, 13 theorems, 185 equations, 11 figures, 2 tables, 2 algorithms.

Key Result

Lemma 1

Consider two $K \times K$ matrices $B_1$ and $B_2$ with EVDs given by $B_r = Q_r \Lambda Q_r^{\top},\,r =1,2$, for some diagonal matrix $\Lambda$ with distinct diagonal entries. Then, a permutation matrix $P^*$ satisfies if and only if there exists a diagonal sign matrix $S^*$, such that Moreover if $B_1$ is $(\theta, \eta)$ friendly, then there is at most one $P^*$ satisfying eq:B1:B2:Ps.

Figures (11)

  • Figure 1: Schematic diagram of permutation recovery in Theorem \ref{['prop:matching:recovery']}. The solid and dashed straight arrows correspond to exact and approximate match. The bent arrow represents an application of the matching algorithm $\mathcal{M}$.
  • Figure 2: ROC curves for the RDPG Experiment 1 (left) and Experiment 2 (right).
  • Figure 3: ROC curves for the graphon example.
  • Figure 4: ROC curves for the COLLAB dataset.
  • Figure 5: ROC curves for the SW-GOT dataset.
  • ...and 6 more figures

Theorems & Definitions (30)

  • Definition 1
  • Remark 1
  • Lemma 1
  • Theorem 1: Matching consistency
  • Remark 2
  • Theorem 2: Null distribution
  • Remark 4.1
  • Theorem 3: Consistency
  • Remark 3: Can joint community detection help?
  • Remark 4
  • ...and 20 more