Testing Correlation in Graphs by Counting Bounded Degree Motifs
Dong Huang, Pengkun Yang
TL;DR
The paper addresses the problem of testing correlation between two Erdős-Rényi graphs under a latent permutation, proposing a polynomial-time test based on counting bounded-degree motifs. By centering injective homomorphism counts and forming a statistic $\mathcal{T}_{\mathcal{M}}$, the authors establish detection guarantees for any constant $\rho$ in regimes $p \ge n^{-2/3}$ (and related sparse-dense refinements), overcoming prior Otter-constant limitations. Theoretical results hinge on $C$-admissible motif families, with Type I and II error analyses tied to the motif signal score $\sum_{\mathsf{M}}\rho^{2\mathsf{e}(\mathsf{M})}$ and robust second-moment control; two concrete motif families are constructed to ensure practical computability. Numerical experiments on synthetic data and a co-citation network validate the approach, showing improved power from richer motif families and competitive performance against baselines. Overall, the work delivers a scalable, information-rich framework for graph-correlation detection that aligns with computational hardness considerations while remaining practically effective for real networks.
Abstract
Correlation analysis is a fundamental step for extracting meaningful insights from complex datasets. In this paper, we investigate the problem of detecting correlation between two Erdős-Rényi graphs $G(n,p)$, formulated as a hypothesis testing problem: under the null hypothesis, the two graphs are independent, while under the alternative hypothesis, they are correlated. We develop a polynomial-time test by counting bounded degree motifs and prove its effectiveness for any constant correlation coefficient $ρ$ when the edge connecting probability satisfies $p\ge n^{-2/3}$. Our results overcome the limitation requiring $ρ\ge \sqrtα$, where $α\approx 0.338$ is the Otter's constant, extending it to any constant $ρ$. Methodologically, bounded degree motifs -- ubiquitous in real networks -- make the proposed statistic both natural and scalable. We also validate our method on synthetic and real co-citation networks, further confirming that this simple motif family effectively captures correlation signals and exhibits strong empirical performance.
