Detection of Correlated Random Vectors

Dor Elimelech; Wasim Huleihel

Detection of Correlated Random Vectors

Dor Elimelech, Wasim Huleihel

TL;DR

This work addresses the problem of deciding whether two Gaussian vectors are correlated when matched via a random permutation, deriving sharp information-theoretic thresholds for strong and weak detection in the 1D setting and extending the analysis to high-dimensional partial correlations. The authors introduce a novel second-moment method based on an orthogonal decomposition into Hermite polynomials, revealing a surprising link to integer partitions and enabling tractable lower bounds. They also develop efficient counting-based tests that achieve strong/weak detection in various regimes and provide detailed comparisons to prior results. The results advance the theoretical understanding of data alignment and planted-structure detection, with implications for high-dimensional correlation discovery and practical detection algorithms.

Abstract

In this paper, we investigate the problem of deciding whether two standard normal random vectors $\mathsf{X}\in\mathbb{R}^{n}$ and $\mathsf{Y}\in\mathbb{R}^{n}$ are correlated or not. This is formulated as a hypothesis testing problem, where under the null hypothesis, these vectors are statistically independent, while under the alternative, $\mathsf{X}$ and a randomly and uniformly permuted version of $\mathsf{Y}$, are correlated with correlation $ρ$. We analyze the thresholds at which optimal testing is information-theoretically impossible and possible, as a function of $n$ and $ρ$. To derive our information-theoretic lower bounds, we develop a novel technique for evaluating the second moment of the likelihood ratio using an orthogonal polynomials expansion, which among other things, reveals a surprising connection to integer partition functions. We also study a multi-dimensional generalization of the above setting, where rather than two vectors we observe two databases/matrices, and furthermore allow for partial correlations between these two.

Detection of Correlated Random Vectors

TL;DR

Abstract

In this paper, we investigate the problem of deciding whether two standard normal random vectors

and

are correlated or not. This is formulated as a hypothesis testing problem, where under the null hypothesis, these vectors are statistically independent, while under the alternative,

and a randomly and uniformly permuted version of

, are correlated with correlation

. We analyze the thresholds at which optimal testing is information-theoretically impossible and possible, as a function of

and

. To derive our information-theoretic lower bounds, we develop a novel technique for evaluating the second moment of the likelihood ratio using an orthogonal polynomials expansion, which among other things, reveals a surprising connection to integer partition functions. We also study a multi-dimensional generalization of the above setting, where rather than two vectors we observe two databases/matrices, and furthermore allow for partial correlations between these two.

Paper Structure (13 sections, 16 theorems, 139 equations, 3 figures, 2 tables)

This paper contains 13 sections, 16 theorems, 139 equations, 3 figures, 2 tables.

Introduction
Model Formulation
Phase Transition in 1D
Lower Bound via Polynomial Decomposition
Hermite polynomials and Hilbert spaces
Proof outline
The proof of the lower bound
Partial Correlation in High Dimensions
Lower bounds
Upper bounds
Conclusion and Outlook
Proof of Lemma \ref{['lem:simplecalc']}
Proof of Proposition \ref{['prop:nastybound']}

Key Result

Theorem 1

Consider the detection problem in eqn:decproblem. For any sequence $(\rho,n)=(\rho_k,n_k)_k$ such that $\rho^2=1-\Omega(1)$, we have Namely, strong detection is impossible.

Figures (3)

Figure 1: An illustration of the detection problem. On the left are the uncorrelated vectors under the null hypothesis ${\cal H}_0$. On the right are the vectors $\mathsf{X}$ and $\mathsf{Y}$ under the hypothesis ${\cal H}_1$, where correlated elements are marked with a similar color.
Figure 2: The risk of the count test $\phi_{\mathsf{count}}$ as a function of $\rho$, for $d=1$, $n=100$ and values of $\rho^2$ spanning between $1-n^{-2}$ and $1-n^{-5}$. The results are compatible with Theorem \ref{['th:lowerStrong']}, showing that the risk indeed vanishes when $\rho^2=1-o(n^{-4}$).
Figure 3: An illustration of the correlation structure under the alternative hypothesis ${\cal H}_1$. Correlated elements are marked with a similar color, where the remaining mutually independent elements are colorless.

Theorems & Definitions (26)

Definition 1
Theorem 1: Impossibility of strong detection
Theorem 2: Impossibility of weak detection
Theorem 3: Count test strong detection
Theorem 4: Comparison test weak detection
Lemma 1
proof
Definition 2
proof : Proof of Theorem \ref{['th:lowerWeak']}
Lemma 2
...and 16 more

Detection of Correlated Random Vectors

TL;DR

Abstract

Detection of Correlated Random Vectors

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (26)