Table of Contents
Fetching ...

Sparse PCA: Phase Transitions in the Critical Regime

Michael J. Feldman, Theodor Misiakiewicz, Elad Romanov

TL;DR

This work analyzes sparse PCA in Johnstone's spiked covariance model under the critical sparsity regime $m/\sqrt{n}\to\beta$, $p/n\to\gamma$, by introducing generalized covariance thresholding (GCT) based on kernel PCA. It establishes a BBP-like phase transition: above a kernel-dependent threshold $\lambda_*(f,\gamma,\beta)$, kernel PCA yields an informative signal with a nonzero alignment to the spike and enables exact support recovery via thresholding; below the threshold, the principal component is asymptotically orthogonal to the spike. The paper provides explicit limiting formulas for eigenvalues and eigenvector correlations for polynomial and non-polynomial kernels, proves the existence of an optimal kernel $f^*$ (and near-optimal soft-thresholding), and shows that as $\beta\to\infty$ the phase transition converges to the BBP limit. Numerical results corroborate the theory, demonstrating robust detection and recovery with adaptive thresholding and kernel selection, and highlighting the practical impact for efficiently recovering sparse principal components in high-dimensional data.

Abstract

This work studies estimation of sparse principal components in high dimensions. Specifically, we consider a class of estimators based on kernel PCA, generalizing the covariance thresholding algorithm proposed by Krauthgamer et al. (2015). Focusing on Johnstone's spiked covariance model, we investigate the "critical" sparsity regime, where the sparsity level $m$, sample size $n$, and dimension $p$ each diverge and $m/\sqrt{n} \rightarrow β$, $p/n \rightarrow γ$. Within this framework, we develop a fine-grained understanding of signal detection and recovery. Our results establish a detectability phase transition, analogous to the Baik--Ben Arous--Péché (BBP) transition: above a certain threshold -- depending on the kernel function, $γ$, and $β$ -- kernel PCA is informative. Conversely, below the threshold, kernel principal components are asymptotically orthogonal to the signal. Notably, above this detection threshold, we find that consistent support recovery is possible with high probability. Sparsity plays a key role in our analysis, and results in more nuanced phenomena than in related studies of kernel PCA with delocalized (dense) components. Finally, we identify optimal kernel functions for detection -- and consequently, support recovery -- and numerical calculations suggest that soft thresholding is nearly optimal.

Sparse PCA: Phase Transitions in the Critical Regime

TL;DR

This work analyzes sparse PCA in Johnstone's spiked covariance model under the critical sparsity regime , , by introducing generalized covariance thresholding (GCT) based on kernel PCA. It establishes a BBP-like phase transition: above a kernel-dependent threshold , kernel PCA yields an informative signal with a nonzero alignment to the spike and enables exact support recovery via thresholding; below the threshold, the principal component is asymptotically orthogonal to the spike. The paper provides explicit limiting formulas for eigenvalues and eigenvector correlations for polynomial and non-polynomial kernels, proves the existence of an optimal kernel (and near-optimal soft-thresholding), and shows that as the phase transition converges to the BBP limit. Numerical results corroborate the theory, demonstrating robust detection and recovery with adaptive thresholding and kernel selection, and highlighting the practical impact for efficiently recovering sparse principal components in high-dimensional data.

Abstract

This work studies estimation of sparse principal components in high dimensions. Specifically, we consider a class of estimators based on kernel PCA, generalizing the covariance thresholding algorithm proposed by Krauthgamer et al. (2015). Focusing on Johnstone's spiked covariance model, we investigate the "critical" sparsity regime, where the sparsity level , sample size , and dimension each diverge and , . Within this framework, we develop a fine-grained understanding of signal detection and recovery. Our results establish a detectability phase transition, analogous to the Baik--Ben Arous--Péché (BBP) transition: above a certain threshold -- depending on the kernel function, , and -- kernel PCA is informative. Conversely, below the threshold, kernel principal components are asymptotically orthogonal to the signal. Notably, above this detection threshold, we find that consistent support recovery is possible with high probability. Sparsity plays a key role in our analysis, and results in more nuanced phenomena than in related studies of kernel PCA with delocalized (dense) components. Finally, we identify optimal kernel functions for detection -- and consequently, support recovery -- and numerical calculations suggest that soft thresholding is nearly optimal.
Paper Structure (25 sections, 43 theorems, 276 equations, 6 figures)

This paper contains 25 sections, 43 theorems, 276 equations, 6 figures.

Key Result

Theorem 1.1

Let $a_0 = 0$, $\|f\|_\phi^2< \infty$, and $f$ be bounded on compact sets. The ESD of $\textit{[1]{K}}_0(f)$ converges weakly almost surely to a continuous probability measure $\mu$ on $\mathbb{R}$. The Stieltjes transform $s(z)$ of $\mu$ solves the equation For $z \in \mathbb{C}^+$, equation (eq:stj_trans) has a unique solution $s(z)$ with ${\rm Im}(s(z)) > 0$.

Figures (6)

  • Figure 1: Stylized illustration of our results as $p/n \rightarrow \gamma$. Area (I) is the recovery region of PCA, the union of areas (I) and (II) is the recovery region of GCT, and (III) is the GCT impossibility region. The boundary between regions (II) and (III), the curve $\lambda_*(\gamma, \beta)$, is the phase transition location of the optimal kernel function, characterized in Section \ref{['sec:2.3']}. As $\beta \rightarrow \infty$, $\lambda_*(\gamma,\beta) \rightarrow 1+ \sqrt{\gamma}$ (the BBP transition).
  • Figure 2: Cosine similarities for GCT with the kernel $\eta_s(\cdot, 2)$ (blue) and standard PCA (orange). There is close agreement between Theorem \ref{['thrm:B']} (solid lines) and simulations (points, each representing the average over 50 simulations). On the left, $n =$ 10,000, $p =$ 5,000, and $m = 25$, so $\beta = 1/4$. On the right, $n =$ 10,000, $p =$ 5,000, and $m = 50$, so $\beta = 1/2$. Observe that the phase transition of GCT decreases with $\beta$.
  • Figure 3: Simulations of GCT with the kernel $\eta_s(\cdot, 2)$ (blue) and standard PCA (orange). Points represent the fraction of 50 simulations in which ${\boldsymbol v}$ was recovered. On the left, $n =$ 10,000, $p =$ 5,000, and $m = 25$, so $\beta = 1/4$. On the right, $n =$ 10,000, $p =$ 5,000, and $m = 50$, so $\beta = 1/2$.
  • Figure 4: The optimal phase transition of GCT (left) and soft thresholding (right), for $\gamma \in \{.5,1,1.5\}$ and $\beta \in [.1,2.5]$. Notice that (1) $\lambda_{s,*}(\gamma,\beta)$ and $\lambda_{*}(\gamma,\beta)$ are visually nearly indistinguishable, suggesting that soft thresholding is close to optimal, and (2) $\lambda_{*}(\gamma,\beta) \rightarrow 1+ \sqrt{\gamma}$ as $\beta \rightarrow \infty$, supporting Lemma \ref{['prop_betainf']}.
  • Figure 5: Cosine similarities for soft thresholding with adaptive threshold selection (blue) and the optimal threshold $t_*(\gamma,\beta)$ (orange). Here, $n = \mathrm{ }$ 10,000 and $p =$ 5,000, so $\gamma = .5$. At each value of $\beta \in [.1,2]$, we set the signal strength to be $\lambda = \lambda_{s,*}(\gamma,\beta) + .1$. Each point represents the average of 50 simulations. Interestingly, for intermediate values of $\beta$, adaptive thresholding empirically outperforms the optimal fixed level $t_*(\gamma,\beta)$.
  • ...and 1 more figures

Theorems & Definitions (76)

  • Theorem 1.1
  • Corollary 1.2
  • Theorem 2.1
  • Theorem 2.2
  • Remark 2.1
  • Theorem 2.3
  • Theorem 2.4
  • Remark 2.2
  • Remark 2.3
  • Corollary 2.5
  • ...and 66 more