Table of Contents
Fetching ...

Nonparametric two-sample hypothesis testing for low-rank random graphs of differing sizes

Joshua Agterberg, Minh Tang, Carey Priebe

TL;DR

This work formalizes the notion of "equality of distribution" under the framework of the generalized random dot product graph, which considers as special cases a number of popular network models with low-rank expectations, and proposes a nonparametric two-sample test statistic to conduct this test.

Abstract

Given two networks of differing sizes, it is of interest to test whether the two networks belong to the same distribution. We formalize the notion of "equality of distribution" under the framework of the generalized random dot product graph, which considers as special cases a number of popular network models with low-rank expectations. We then propose a nonparametric two-sample test statistic to conduct this test, assuming only that the networks have independent edges generated from low-rank probability matrices. Our proposed test statistic involves using the maximum mean discrepancy applied to suitably rotated rows of a graph embedding, where the rotation is estimated using optimal transport. We show that our test statistic, appropriately scaled, is consistent for sufficiently dense graphs, and we study its convergence under different sparsity regimes, and our results are demonstrated in numerical simulations.

Nonparametric two-sample hypothesis testing for low-rank random graphs of differing sizes

TL;DR

This work formalizes the notion of "equality of distribution" under the framework of the generalized random dot product graph, which considers as special cases a number of popular network models with low-rank expectations, and proposes a nonparametric two-sample test statistic to conduct this test.

Abstract

Given two networks of differing sizes, it is of interest to test whether the two networks belong to the same distribution. We formalize the notion of "equality of distribution" under the framework of the generalized random dot product graph, which considers as special cases a number of popular network models with low-rank expectations. We then propose a nonparametric two-sample test statistic to conduct this test, assuming only that the networks have independent edges generated from low-rank probability matrices. Our proposed test statistic involves using the maximum mean discrepancy applied to suitably rotated rows of a graph embedding, where the rotation is estimated using optimal transport. We show that our test statistic, appropriately scaled, is consistent for sufficiently dense graphs, and we study its convergence under different sparsity regimes, and our results are demonstrated in numerical simulations.

Paper Structure

This paper contains 21 sections, 15 theorems, 188 equations, 3 figures, 1 table, 1 algorithm.

Key Result

Theorem 3.1

Suppose Assumptions assumption:kappa and sparsity1 hold. Then under the null hypothesis $F_X \simeq F_Y$ there exist two sequences of block-orthogonal matrices $\mathbf{\hat{W}}_n$ and $\mathbf{W}_n$ such that almost surely. If instead $F_X \not \simeq F_Y$, then almost surely,

Figures (3)

  • Figure 1: Hypothesis testing results with different values of $\nu$, where $\nu$ represents a local departure from the null hypothesis described in \ref{['sec:sims']}. As $n$ increases, it is clear that power increases, with power tending to one slower for sparser networks, which reflects our theory. The dotted line represents level $.05$. When curves overlap, the larger value of $\nu$ takes precedence.For the value of $\nu = .05$ representing the smallest local departure from the null, the test is only able to detect when $n$ is large and the networks are sufficiently dense. Indeed, in the rightmost figure, the power curve nearly overlaps with that of the null hypothesis.
  • Figure 2: Hypothesis testing results for degree-corrected stochastic blockmodel departures from the null hypothesis, with $\gamma$ increasing representing more degree heterogeneity. As $n$ increases power tends to increase, though the improvement is slower, as these networks are much sparser than the previous example. When curves overlap, the larger value of $\gamma$ takes precedence.For example, in the rightmost figure (sparsity $= .6$), the power curves associated to $\gamma \in \{0,.1,.2\}$ completely overlap at zero.
  • Figure 3: Diagram of the alignment matrices and where they come from. Both $\mathbf{\tilde{Q}_X}$ and $\mathbf{W_X}$ come from Lemma \ref{['lem:qtilde']}, whereas the matrix $\mathbf{W_*^X}$ comes from Lemma \ref{['lem:frobenius']} (or Theorem \ref{['thm:grdpg']}).

Theorems & Definitions (34)

  • Definition 2.1
  • Definition 2.2: rubindelanchy_statistical_2022
  • Definition 2.3
  • Definition 2.4: Adjacency Spectral Embedding
  • Theorem 3.1
  • Theorem 3.2
  • Corollary 3.3
  • Corollary 3.4
  • Proposition 3.5
  • Theorem 3.6
  • ...and 24 more