Table of Contents
Fetching ...

Tracy-Widom, Gaussian, and Bootstrap: Approximations for Leading Eigenvalues in High-Dimensional PCA

Nina Dörnemann, Miles E. Lopes

TL;DR

The paper addresses the problem of identifying whether the leading eigenvalue fluctuations in high-dimensional PCA follow Tracy-Widom behavior (subcritical) or Gaussian fluctuations (supercritical). It introduces a hypothesis test based on $T_n = \frac{n^{2/3}}{\widehat{\sigma}_n}(\lambda_1(\widehat{\Sigma})-\lambda_2(\widehat{\Sigma}))$, with a consistently estimated scale $\widehat{\sigma}_n$ and a subcritical-consistent bootstrap for functionals of leading eigenvalues. The authors prove asymptotic level control under $\mathsf{H}_{0,n}$ and power consistency under alternatives with $K$ supercritical spikes, and they establish bootstrap consistency in the subcritical regime. Numerical experiments and stock-market data illustrate the approach's superior power over gap-ratio methods and its practical relevance for high-dimensional inference in PCA.

Abstract

Under certain conditions, the largest eigenvalue of a sample covariance matrix undergoes a well-known phase transition when the sample size $n$ and data dimension $p$ diverge proportionally. In the subcritical regime, this eigenvalue has fluctuations of order $n^{-2/3}$ that can be approximated by a Tracy-Widom distribution, while in the supercritical regime, it has fluctuations of order $n^{-1/2}$ that can be approximated with a Gaussian distribution. However, the statistical problem of determining which regime underlies a given dataset is far from resolved. We develop a new testing framework and procedure to address this problem. In particular, we demonstrate that the procedure has an asymptotically controlled level, and that it is power consistent for certain alternatives. Also, this testing procedure enables the design a new bootstrap method for approximating the distributions of functionals of the leading sample eigenvalues within the subcritical regime -- which is the first such method that is supported by theoretical guarantees.

Tracy-Widom, Gaussian, and Bootstrap: Approximations for Leading Eigenvalues in High-Dimensional PCA

TL;DR

The paper addresses the problem of identifying whether the leading eigenvalue fluctuations in high-dimensional PCA follow Tracy-Widom behavior (subcritical) or Gaussian fluctuations (supercritical). It introduces a hypothesis test based on , with a consistently estimated scale and a subcritical-consistent bootstrap for functionals of leading eigenvalues. The authors prove asymptotic level control under and power consistency under alternatives with supercritical spikes, and they establish bootstrap consistency in the subcritical regime. Numerical experiments and stock-market data illustrate the approach's superior power over gap-ratio methods and its practical relevance for high-dimensional inference in PCA.

Abstract

Under certain conditions, the largest eigenvalue of a sample covariance matrix undergoes a well-known phase transition when the sample size and data dimension diverge proportionally. In the subcritical regime, this eigenvalue has fluctuations of order that can be approximated by a Tracy-Widom distribution, while in the supercritical regime, it has fluctuations of order that can be approximated with a Gaussian distribution. However, the statistical problem of determining which regime underlies a given dataset is far from resolved. We develop a new testing framework and procedure to address this problem. In particular, we demonstrate that the procedure has an asymptotically controlled level, and that it is power consistent for certain alternatives. Also, this testing procedure enables the design a new bootstrap method for approximating the distributions of functionals of the leading sample eigenvalues within the subcritical regime -- which is the first such method that is supported by theoretical guarantees.

Paper Structure

This paper contains 22 sections, 18 theorems, 148 equations, 2 figures, 5 tables, 1 algorithm.

Key Result

Proposition 1

Suppose that assumptions ass_mp_regime-ass_lsd are satisfied and that $\mathsf{H}_{0,n}$ holds for all large $n$. Then, the estimates $\widehat{\xi}_n$, $\widehat{\sigma}_n$, and $\tilde{H}_n$ satisfy the following limits as $n\to\infty$

Figures (2)

  • Figure 1: A conceptual comparison of the proposed statistic $T_n$ with $R_n(\kappa)$ for $\kappa=1,2,3$ in detecting four instances of $\mathsf{H}_{1,n}$. Green cells correspond to cases where a test is expected to have power approaching 1 as $n\to\infty$. Yellow cells correspond to cases where $R_n(\kappa)$ may have power approaching 1 as $n\to\infty$, but the power may be reduced from taking a maximum over several gap ratios. Red cells correspond to cases where $R_n(\kappa)$ is not expected to have power approaching 1 as $n\to\infty$.
  • Figure 2: Rejection probabilities for the tests $T_n$ (triangle), $R_n(1)$ (diamond) and $R_n(10)$ (circle), plotted as a function of $\lambda_1(\mathbf{\Sigma})$. In each panel, the arrows below the x-axis specify the values of $\lambda_1(\mathbf{\Sigma})$ corresponding to $\mathsf{H}_{0,n}$ and $\mathsf{H}_{1,n}(1)$. The nominal level of $\alpha=0.05$ is marked with a horizontal line in each panel. First row: Spiked spectrum with $x_{11} \sim \mathcal{N}(0,1)$ (left panel) and $x_{11} \sim t_{10}/\sqrt{\operatorname{var}(t_{10})}$ (right panel). Second row: Decaying spectrum with $x_{11}\sim\mathcal{N}(0,1)$, and decay parameter $c=1$ (left panel) and $c=0.5$ (right panel).

Theorems & Definitions (35)

  • Proposition 1
  • Theorem 1
  • Corollary 1
  • Proposition 2
  • Theorem 2
  • Corollary 2
  • Theorem 3
  • Proposition 3
  • Remark 1
  • Lemma 1
  • ...and 25 more