Sample Complexity of Correlation Detection in the Gaussian Wigner Model

Dong Huang; Pengkun Yang

Sample Complexity of Correlation Detection in the Gaussian Wigner Model

Dong Huang, Pengkun Yang

TL;DR

The paper studies correlation detection between two unlabeled Gaussian Wigner graphs when two induced subgraphs of size $s$ are sampled from graphs with $n$ vertices. It establishes the optimal sample-size scaling $s^2 \asymp \left( \frac{n\log n}{\log\left(1/(1-\rho^2)\right)} \vee n \right)$ for reliable detection and provides both possibility and impossibility results, with a polynomial-time approximate detector based on clique seeds achieving practical performance. The analysis introduces an $f$-based similarity statistic and leverages the conditional second moment to handle partial observations, yielding two detectors: a maximal-overlap estimator and a minimal-mean-squared-error estimator, each with regime-specific thresholds. The work has practical implications for efficient correlation testing and privacy considerations in network data, and it outlines extensions to other graph models and computational-hardness perspectives through the low-degree framework.

Abstract

Correlation analysis is a fundamental step in uncovering meaningful insights from complex datasets. In this paper, we study the problem of detecting correlations between two random graphs following the Gaussian Wigner model with unlabeled vertices. Specifically, the task is formulated as a hypothesis testing problem: under the null hypothesis, the two graphs are independent, while under the alternative hypothesis, they are edge-correlated through a latent vertex permutation, yet maintain the same marginal distributions as under the null. We focus on the scenario where two induced subgraphs, each with a fixed number of vertices, are sampled. We determine the optimal rate for the sample size required for correlation detection, derived through an analysis of the conditional second moment. Additionally, we propose an efficient approximate algorithm that significantly reduces running time.

Sample Complexity of Correlation Detection in the Gaussian Wigner Model

TL;DR

The paper studies correlation detection between two unlabeled Gaussian Wigner graphs when two induced subgraphs of size

are sampled from graphs with

vertices. It establishes the optimal sample-size scaling

for reliable detection and provides both possibility and impossibility results, with a polynomial-time approximate detector based on clique seeds achieving practical performance. The analysis introduces an

-based similarity statistic and leverages the conditional second moment to handle partial observations, yielding two detectors: a maximal-overlap estimator and a minimal-mean-squared-error estimator, each with regime-specific thresholds. The work has practical implications for efficient correlation testing and privacy considerations in network data, and it outlines extensions to other graph models and computational-hardness perspectives through the low-degree framework.

Sample Complexity of Correlation Detection in the Gaussian Wigner Model

TL;DR

Abstract

Sample Complexity of Correlation Detection in the Gaussian Wigner Model

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (21)