Table of Contents
Fetching ...

Detection of Correlated Random Vectors

Dor Elimelech, Wasim Huleihel

TL;DR

This work addresses the problem of deciding whether two Gaussian vectors are correlated when matched via a random permutation, deriving sharp information-theoretic thresholds for strong and weak detection in the 1D setting and extending the analysis to high-dimensional partial correlations. The authors introduce a novel second-moment method based on an orthogonal decomposition into Hermite polynomials, revealing a surprising link to integer partitions and enabling tractable lower bounds. They also develop efficient counting-based tests that achieve strong/weak detection in various regimes and provide detailed comparisons to prior results. The results advance the theoretical understanding of data alignment and planted-structure detection, with implications for high-dimensional correlation discovery and practical detection algorithms.

Abstract

In this paper, we investigate the problem of deciding whether two standard normal random vectors $\mathsf{X}\in\mathbb{R}^{n}$ and $\mathsf{Y}\in\mathbb{R}^{n}$ are correlated or not. This is formulated as a hypothesis testing problem, where under the null hypothesis, these vectors are statistically independent, while under the alternative, $\mathsf{X}$ and a randomly and uniformly permuted version of $\mathsf{Y}$, are correlated with correlation $ρ$. We analyze the thresholds at which optimal testing is information-theoretically impossible and possible, as a function of $n$ and $ρ$. To derive our information-theoretic lower bounds, we develop a novel technique for evaluating the second moment of the likelihood ratio using an orthogonal polynomials expansion, which among other things, reveals a surprising connection to integer partition functions. We also study a multi-dimensional generalization of the above setting, where rather than two vectors we observe two databases/matrices, and furthermore allow for partial correlations between these two.

Detection of Correlated Random Vectors

TL;DR

This work addresses the problem of deciding whether two Gaussian vectors are correlated when matched via a random permutation, deriving sharp information-theoretic thresholds for strong and weak detection in the 1D setting and extending the analysis to high-dimensional partial correlations. The authors introduce a novel second-moment method based on an orthogonal decomposition into Hermite polynomials, revealing a surprising link to integer partitions and enabling tractable lower bounds. They also develop efficient counting-based tests that achieve strong/weak detection in various regimes and provide detailed comparisons to prior results. The results advance the theoretical understanding of data alignment and planted-structure detection, with implications for high-dimensional correlation discovery and practical detection algorithms.

Abstract

In this paper, we investigate the problem of deciding whether two standard normal random vectors and are correlated or not. This is formulated as a hypothesis testing problem, where under the null hypothesis, these vectors are statistically independent, while under the alternative, and a randomly and uniformly permuted version of , are correlated with correlation . We analyze the thresholds at which optimal testing is information-theoretically impossible and possible, as a function of and . To derive our information-theoretic lower bounds, we develop a novel technique for evaluating the second moment of the likelihood ratio using an orthogonal polynomials expansion, which among other things, reveals a surprising connection to integer partition functions. We also study a multi-dimensional generalization of the above setting, where rather than two vectors we observe two databases/matrices, and furthermore allow for partial correlations between these two.
Paper Structure (13 sections, 16 theorems, 139 equations, 3 figures, 2 tables)

This paper contains 13 sections, 16 theorems, 139 equations, 3 figures, 2 tables.

Key Result

Theorem 1

Consider the detection problem in eqn:decproblem. For any sequence $(\rho,n)=(\rho_k,n_k)_k$ such that $\rho^2=1-\Omega(1)$, we have Namely, strong detection is impossible.

Figures (3)

  • Figure 1: An illustration of the detection problem. On the left are the uncorrelated vectors under the null hypothesis ${\cal H}_0$. On the right are the vectors $\mathsf{X}$ and $\mathsf{Y}$ under the hypothesis ${\cal H}_1$, where correlated elements are marked with a similar color.
  • Figure 2: The risk of the count test $\phi_{\mathsf{count}}$ as a function of $\rho$, for $d=1$, $n=100$ and values of $\rho^2$ spanning between $1-n^{-2}$ and $1-n^{-5}$. The results are compatible with Theorem \ref{['th:lowerStrong']}, showing that the risk indeed vanishes when $\rho^2=1-o(n^{-4}$).
  • Figure 3: An illustration of the correlation structure under the alternative hypothesis ${\cal H}_1$. Correlated elements are marked with a similar color, where the remaining mutually independent elements are colorless.

Theorems & Definitions (26)

  • Definition 1
  • Theorem 1: Impossibility of strong detection
  • Theorem 2: Impossibility of weak detection
  • Theorem 3: Count test strong detection
  • Theorem 4: Comparison test weak detection
  • Lemma 1
  • proof
  • Definition 2
  • proof : Proof of Theorem \ref{['th:lowerWeak']}
  • Lemma 2
  • ...and 16 more