Network two-sample test for block models

Chung Kyong Nguen; Oscar Hernan Madrid Padilla; Arash A. Amini

Network two-sample test for block models

Chung Kyong Nguen, Oscar Hernan Madrid Padilla, Arash A. Amini

TL;DR

This work addresses the problem of two-sample testing for networks when there is no vertex correspondence, by modeling networks with SBMs and solving the graph-matching challenge via a spectral matching algorithm. The authors construct SBM-TS, a three-stage testing procedure that aligns estimated block connectivity matrices across samples and forms a global test statistic with asymptotic $\chi^2$ null distribution. They establish matching consistency, a null chi-square limit, and test consistency under alternatives, under mild sparsity and sample-size conditions, and validate the approach on synthetic data (SBMs, RDPG, graphons) and real datasets (COLLAB, SW-GOT). The method is computationally efficient, robust to mislabeling, and adaptable to multiple-sample settings, offering a principled framework for inference on network populations without node alignment. Overall, SBM-TS provides a practical and theoretically grounded tool for robust statistical inference in complex unlabeled network data with broad applicability.

Abstract

We consider the two-sample testing problem for networks, where the goal is to determine whether two sets of networks originated from the same stochastic model. Assuming no vertex correspondence and allowing for different numbers of nodes, we address a fundamental network testing problem that goes beyond simple adjacency matrix comparisons. We adopt the stochastic block model (SBM) for network distributions, due to their interpretability and the potential to approximate more general models. The lack of meaningful node labels and vertex correspondence translate to a graph matching challenge when developing a test for SBMs. We introduce an efficient algorithm to match estimated network parameters, allowing us to properly combine and contrast information within and across samples, leading to a powerful test. We show that the matching algorithm, and the overall test are consistent, under mild conditions on the sparsity of the networks and the sample sizes, and derive a chi-squared asymptotic null distribution for the test. Through a mixture of theoretical insights and empirical validations, including experiments with both synthetic and real-world data, this study advances robust statistical inference for complex network data.

Network two-sample test for block models

TL;DR

null distribution. They establish matching consistency, a null chi-square limit, and test consistency under alternatives, under mild sparsity and sample-size conditions, and validate the approach on synthetic data (SBMs, RDPG, graphons) and real datasets (COLLAB, SW-GOT). The method is computationally efficient, robust to mislabeling, and adaptable to multiple-sample settings, offering a principled framework for inference on network populations without node alignment. Overall, SBM-TS provides a practical and theoretically grounded tool for robust statistical inference in complex unlabeled network data with broad applicability.

Abstract

Paper Structure (46 sections, 13 theorems, 185 equations, 11 figures, 2 tables, 2 algorithms)

This paper contains 46 sections, 13 theorems, 185 equations, 11 figures, 2 tables, 2 algorithms.

Introduction
Summary of results
Related work
Notation
Matching Methodology
Matching challenge
Spectral matching
Test construction
Main algorithm
Theory
Matching consistency
Null distribution
Test Consistency
Experimental results
Competing methods
...and 31 more sections

Key Result

Lemma 1

Consider two $K \times K$ matrices $B_1$ and $B_2$ with EVDs given by $B_r = Q_r \Lambda Q_r^{\top},\,r =1,2$, for some diagonal matrix $\Lambda$ with distinct diagonal entries. Then, a permutation matrix $P^*$ satisfies if and only if there exists a diagonal sign matrix $S^*$, such that Moreover if $B_1$ is $(\theta, \eta)$ friendly, then there is at most one $P^*$ satisfying eq:B1:B2:Ps.

Figures (11)

Figure 1: Schematic diagram of permutation recovery in Theorem \ref{['prop:matching:recovery']}. The solid and dashed straight arrows correspond to exact and approximate match. The bent arrow represents an application of the matching algorithm $\mathcal{M}$.
Figure 2: ROC curves for the RDPG Experiment 1 (left) and Experiment 2 (right).
Figure 3: ROC curves for the graphon example.
Figure 4: ROC curves for the COLLAB dataset.
Figure 5: ROC curves for the SW-GOT dataset.
...and 6 more figures

Theorems & Definitions (30)

Definition 1
Remark 1
Lemma 1
Theorem 1: Matching consistency
Remark 2
Theorem 2: Null distribution
Remark 4.1
Theorem 3: Consistency
Remark 3: Can joint community detection help?
Remark 4
...and 20 more

Network two-sample test for block models

TL;DR

Abstract

Network two-sample test for block models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (30)