Table of Contents
Fetching ...

A Pure Hypothesis Test for Inhomogeneous Random Graph Models Based on a Kernelised Stein Discrepancy

Anum Fatima, Gesine Reinert

TL;DR

This work introduces IRG-gKSS, a pure hypothesis test for assessing the fit of a pre-specified inhomogeneous random graph model using a kernelised Stein discrepancy tailored to graphs. It defines a Stein operator for IRGs, derives a computable graph-kernel statistic, and implements a Monte Carlo testing framework that requires only a single observed network and no asymptotic regime assumptions. The method demonstrates strong power against structured alternatives (e.g., planted hubs/cliques) and yields plausible inferences on real networks, with theoretical guarantees including a non-asymptotic normal approximation under fairly general conditions. Practical considerations include the use of graph kernels like Weisfeiler-Lehman, edge-resampling for large graphs, and the potential for multiple kernels to improve robustness and power. Overall, IRG-gKSS provides a principled, scalable, and assumption-light tool for validating IRG models in network analysis.

Abstract

Complex data are often represented as a graph, which in turn can often be viewed as a realisation of a random graph, such as an inhomogeneous random graph model (IRG). For general fast goodness-of-fit tests in high dimensions, kernelised Stein discrepancy (KSD) tests are a powerful tool. Here, we develop a KSD-type test for IRG models that can be carried out with a single observation of the network. The test applies to a network of any size, but is particularly interesting for small networks for which asymptotic tests are not warranted. We also provide theoretical guarantees.

A Pure Hypothesis Test for Inhomogeneous Random Graph Models Based on a Kernelised Stein Discrepancy

TL;DR

This work introduces IRG-gKSS, a pure hypothesis test for assessing the fit of a pre-specified inhomogeneous random graph model using a kernelised Stein discrepancy tailored to graphs. It defines a Stein operator for IRGs, derives a computable graph-kernel statistic, and implements a Monte Carlo testing framework that requires only a single observed network and no asymptotic regime assumptions. The method demonstrates strong power against structured alternatives (e.g., planted hubs/cliques) and yields plausible inferences on real networks, with theoretical guarantees including a non-asymptotic normal approximation under fairly general conditions. Practical considerations include the use of graph kernels like Weisfeiler-Lehman, edge-resampling for large graphs, and the potential for multiple kernels to improve robustness and power. Overall, IRG-gKSS provides a principled, scalable, and assumption-light tool for validating IRG models in network analysis.

Abstract

Complex data are often represented as a graph, which in turn can often be viewed as a realisation of a random graph, such as an inhomogeneous random graph model (IRG). For general fast goodness-of-fit tests in high dimensions, kernelised Stein discrepancy (KSD) tests are a powerful tool. Here, we develop a KSD-type test for IRG models that can be carried out with a single observation of the network. The test applies to a network of any size, but is particularly interesting for small networks for which asymptotic tests are not warranted. We also provide theoretical guarantees.

Paper Structure

This paper contains 46 sections, 5 theorems, 115 equations, 24 figures, 8 tables, 5 algorithms.

Key Result

Proposition 3.1

For a graph $\mathcal{G} = (\mathcal{V},\mathcal{E})$ with adjacency matrix $\mathbf{X} \sim \text{IRG}(\mathbf{p})$, the operator SE_ermm_op is a Stein operator with Stein class $\mathcal{F}(\mathcal{A}) = \{ f: \{0,1\}^{N} \rightarrow \mathbb{R}\}$, that is, for all $f: \{0,1\}^{N} \rightarrow \ma

Figures (24)

  • Figure 1: Power of the test to assess the fit of an ERMM$(\mathbf{n}, \mathbf{Q})$ to the network of size 50 with planted hubs. The numbers in boxes at the top of the plot are the average maximum degree observed in $m$ repetitions of the test for each setting on the $x$-axis. Left: we fix $k=4$, the size of the hub, and let $R$, the number of hubs, vary; right: we fix $R=3$ and let $k$ vary.
  • Figure 2: Power of the tests for the fit of an ER$(50,0.06)$ to the network of size 50 with a planted clique of size $K$ using different proportions of edge resampling as well as the no edge resampling version of the IRG-gKSS test statistic. The numbers in the boxes are the total number of networks sampled from the ER$(50,0.06)$ model that had the number of edges less than $\binom{K}{2}$ to plant a clique of size $K$ and were, therefore, not used in the experiment. The number is the total of $m= 50$ repetitions of the test.
  • Figure 3: Execution time to calculate IRG-gKSS, using the WL kernel with $h=3$, for networks simulated from ER$(n, 0.06)$ models. Left: dependence on size $n$, for different numbers $M$ of simulated networks; right: dependence on $M$, for different sizes $n$.
  • Figure 4: Power of the tests for the fit of an ERMM$(\mathbf{n}, \mathbf{Q})$ model to a network of size 50 with planted hubs, with $\mathbf{n}$ from \ref{['eq:nnumbers']} and $\mathbf{Q}$ from \ref{['eq:matQ']}. The numbers in the boxes at the top of the plot are the average maximum degrees observed in $m=50$ repetitions of the test for each setting on the $x$-axis. For the figure on the left side, we fix $k=4$ and let $R$ vary, and in the right side figure, we fix $R=3$ and let $k$ vary. For this experiment, we use a WL kernel with $h = 2$.
  • Figure 5: Power of the tests for the fit of an ERMM$(\mathbf{n}_{ub}, \mathbf{Q})$ model to a network of size 50 with planted hubs, with $\mathbf{n}_{ub}$ from \ref{['eq:nnumbers']} and $\mathbf{Q}$ from \ref{['eq:matQ']}. The numbers in the boxes at the top of the plot are the average maximum degrees observed in $m=50$ repetitions of the test for each setting on the $x$-axis. For the figure on the left side, we fix $k=4$ and let $R$ vary, and in the right side figure, we fix $R=3$ and let $k$ vary. For this experiment, we use a WL kernel with $h = 3$.
  • ...and 19 more figures

Theorems & Definitions (12)

  • Proposition 3.1
  • Theorem 5.2
  • proof
  • Lemma A.1
  • proof
  • Theorem A.2
  • proof
  • Corollary A.3
  • Remark A.4
  • proof
  • ...and 2 more