Table of Contents
Fetching ...

Testing Correlation in Graphs by Counting Bounded Degree Motifs

Dong Huang, Pengkun Yang

TL;DR

The paper addresses the problem of testing correlation between two Erdős-Rényi graphs under a latent permutation, proposing a polynomial-time test based on counting bounded-degree motifs. By centering injective homomorphism counts and forming a statistic $\mathcal{T}_{\mathcal{M}}$, the authors establish detection guarantees for any constant $\rho$ in regimes $p \ge n^{-2/3}$ (and related sparse-dense refinements), overcoming prior Otter-constant limitations. Theoretical results hinge on $C$-admissible motif families, with Type I and II error analyses tied to the motif signal score $\sum_{\mathsf{M}}\rho^{2\mathsf{e}(\mathsf{M})}$ and robust second-moment control; two concrete motif families are constructed to ensure practical computability. Numerical experiments on synthetic data and a co-citation network validate the approach, showing improved power from richer motif families and competitive performance against baselines. Overall, the work delivers a scalable, information-rich framework for graph-correlation detection that aligns with computational hardness considerations while remaining practically effective for real networks.

Abstract

Correlation analysis is a fundamental step for extracting meaningful insights from complex datasets. In this paper, we investigate the problem of detecting correlation between two Erdős-Rényi graphs $G(n,p)$, formulated as a hypothesis testing problem: under the null hypothesis, the two graphs are independent, while under the alternative hypothesis, they are correlated. We develop a polynomial-time test by counting bounded degree motifs and prove its effectiveness for any constant correlation coefficient $ρ$ when the edge connecting probability satisfies $p\ge n^{-2/3}$. Our results overcome the limitation requiring $ρ\ge \sqrtα$, where $α\approx 0.338$ is the Otter's constant, extending it to any constant $ρ$. Methodologically, bounded degree motifs -- ubiquitous in real networks -- make the proposed statistic both natural and scalable. We also validate our method on synthetic and real co-citation networks, further confirming that this simple motif family effectively captures correlation signals and exhibits strong empirical performance.

Testing Correlation in Graphs by Counting Bounded Degree Motifs

TL;DR

The paper addresses the problem of testing correlation between two Erdős-Rényi graphs under a latent permutation, proposing a polynomial-time test based on counting bounded-degree motifs. By centering injective homomorphism counts and forming a statistic , the authors establish detection guarantees for any constant in regimes (and related sparse-dense refinements), overcoming prior Otter-constant limitations. Theoretical results hinge on -admissible motif families, with Type I and II error analyses tied to the motif signal score and robust second-moment control; two concrete motif families are constructed to ensure practical computability. Numerical experiments on synthetic data and a co-citation network validate the approach, showing improved power from richer motif families and competitive performance against baselines. Overall, the work delivers a scalable, information-rich framework for graph-correlation detection that aligns with computational hardness considerations while remaining practically effective for real networks.

Abstract

Correlation analysis is a fundamental step for extracting meaningful insights from complex datasets. In this paper, we investigate the problem of detecting correlation between two Erdős-Rényi graphs , formulated as a hypothesis testing problem: under the null hypothesis, the two graphs are independent, while under the alternative hypothesis, they are correlated. We develop a polynomial-time test by counting bounded degree motifs and prove its effectiveness for any constant correlation coefficient when the edge connecting probability satisfies . Our results overcome the limitation requiring , where is the Otter's constant, extending it to any constant . Methodologically, bounded degree motifs -- ubiquitous in real networks -- make the proposed statistic both natural and scalable. We also validate our method on synthetic and real co-citation networks, further confirming that this simple motif family effectively captures correlation signals and exhibits strong empirical performance.

Paper Structure

This paper contains 28 sections, 10 theorems, 122 equations, 9 figures.

Key Result

Theorem 1

For $C$-admissible motif family $\mathcal{M}$, there exists $\tau,\omega_{\mathsf{M}}\in \mathbb{R}$ such that,

Figures (9)

  • Figure 1: Logical flow from admissibility to detection.
  • Figure 2: A special bounded degree motif with vertex set size $N_{\mathsf{v}}$, edge set size $N_{\mathsf{e}}$, and maximal degree $d$.
  • Figure 3: Histograms (left) and boxplots (right) of the bounded degree motif counting statistic $\mathcal{T}_{\mathcal{M}(N_{\mathsf{e}},d)}$ with $N_{\mathsf{e}}=d=4$ for $n=100$, $p=0.05$, and $\rho=0.99$.
  • Figure 4: Comparison of the proposed test statistic $\mathcal{T}_{\mathcal{M}(N_{\mathsf{e}},d)}$ with $N_{\mathsf{e}} = d = 4$ for fixed $p$ and varying correlation parameter $\rho\in \left\{ 0.6,0.7,0.8,0.9,0.99 \right\}$.
  • Figure 5: Comparison with counting-based baselines on synthetic graphs.
  • ...and 4 more figures

Theorems & Definitions (16)

  • Definition 1: Correlated Erdős-Rényi graph
  • Definition 2
  • Theorem 1
  • Proposition 1
  • Proposition 2
  • Remark 1: Asymmetry between Type I and Type II errors
  • Lemma 1
  • Lemma 2
  • Theorem 2
  • Theorem 3
  • ...and 6 more