Table of Contents
Fetching ...

Generalized kernel distance covariance in high dimensions: non-null CLTs and power universality

Qiyang Han, Yandi Shen

Abstract

Distance covariance is a popular dependence measure for two random vectors $X$ and $Y$ of possibly different dimensions and types. Recent years have witnessed concentrated efforts in the literature to understand the distributional properties of the sample distance covariance in a high-dimensional setting, with an exclusive emphasis on the null case that $X$ and $Y$ are independent. This paper derives the first non-null central limit theorem for the sample distance covariance, and the more general sample (Hilbert-Schmidt) kernel distance covariance in high dimensions, primarily in the Gaussian case. The new non-null central limit theorem yields an asymptotically exact first-order power formula for the widely used generalized kernel distance correlation test of independence between $X$ and $Y$. The power formula in particular unveils an interesting universality phenomenon: the power of the generalized kernel distance correlation test is completely determined by $n\cdot \text{dcor}^2(X,Y)/\sqrt{2}$ in the high dimensional limit, regardless of a wide range of choices of the kernels and bandwidth parameters. Furthermore, this separation rate is also shown to be optimal in a minimax sense. The key step in the proof of the non-null central limit theorem is a precise expansion of the mean and variance of the sample distance covariance in high dimensions, which shows, among other things, that the non-null Gaussian approximation of the sample distance covariance involves a rather subtle interplay between the dimension-to-sample ratio and the dependence between $X$ and $Y$.

Generalized kernel distance covariance in high dimensions: non-null CLTs and power universality

Abstract

Distance covariance is a popular dependence measure for two random vectors and of possibly different dimensions and types. Recent years have witnessed concentrated efforts in the literature to understand the distributional properties of the sample distance covariance in a high-dimensional setting, with an exclusive emphasis on the null case that and are independent. This paper derives the first non-null central limit theorem for the sample distance covariance, and the more general sample (Hilbert-Schmidt) kernel distance covariance in high dimensions, primarily in the Gaussian case. The new non-null central limit theorem yields an asymptotically exact first-order power formula for the widely used generalized kernel distance correlation test of independence between and . The power formula in particular unveils an interesting universality phenomenon: the power of the generalized kernel distance correlation test is completely determined by in the high dimensional limit, regardless of a wide range of choices of the kernels and bandwidth parameters. Furthermore, this separation rate is also shown to be optimal in a minimax sense. The key step in the proof of the non-null central limit theorem is a precise expansion of the mean and variance of the sample distance covariance in high dimensions, which shows, among other things, that the non-null Gaussian approximation of the sample distance covariance involves a rather subtle interplay between the dimension-to-sample ratio and the dependence between and .

Paper Structure

This paper contains 66 sections, 59 theorems, 427 equations, 3 figures.

Key Result

Proposition 2.1

The following holds: where the symmetric kernel can be either or Here $Z_i=(X_i,Y_i)$ for $i\in \mathbb{N}$, and $\sigma(1,2,3,4)$ denotes the set of all ordered permutation of $\{1,2,3,4\}$.

Figures (3)

  • Figure 1: Verification of CLTs. Solid lines correspond to the standard normal quantiles, and dashed lines correspond to sample quantiles with the identity, Gaussian, and Laplace kernels, respectively. Simulation parameters: $(n,p,q) = (1000,100,100)$, $B = 200$ replications, bandwidth choices $\rho_X = \rho_Y = \sqrt{2}$ for both Gaussian and Laplace kernels.
  • Figure 2: Verification of power universality in choice of bandwidth parameter (left and middle) and choice of kernel (right). Solid lines correspond to the standard normal quantiles, and dashed lines correspond to sample quantiles.
  • Figure 3: Verification of CLTs and power expansion for uniform (top 3 figures) and $t$- (bottom 3 figures) distributed data. Simulation parameters: $(n,p,q) = (100,100,100)$, $B = 200$ replications, bandwidth choices $\rho_X = \rho_Y = \sqrt{2}$ for both Gaussian and Laplace kernels.

Theorems & Definitions (106)

  • Proposition 2.1: yao2018testinggao2021asymptotic
  • Theorem 2.2
  • Remark 2.3: Variance formula
  • Remark 2.4: Convergence rate
  • Theorem 2.5
  • Theorem 2.6
  • Theorem 2.7
  • Theorem 3.1
  • Corollary 3.2
  • Theorem 3.3
  • ...and 96 more