Sparse PCA: Phase Transitions in the Critical Regime
Michael J. Feldman, Theodor Misiakiewicz, Elad Romanov
TL;DR
This work analyzes sparse PCA in Johnstone's spiked covariance model under the critical sparsity regime $m/\sqrt{n}\to\beta$, $p/n\to\gamma$, by introducing generalized covariance thresholding (GCT) based on kernel PCA. It establishes a BBP-like phase transition: above a kernel-dependent threshold $\lambda_*(f,\gamma,\beta)$, kernel PCA yields an informative signal with a nonzero alignment to the spike and enables exact support recovery via thresholding; below the threshold, the principal component is asymptotically orthogonal to the spike. The paper provides explicit limiting formulas for eigenvalues and eigenvector correlations for polynomial and non-polynomial kernels, proves the existence of an optimal kernel $f^*$ (and near-optimal soft-thresholding), and shows that as $\beta\to\infty$ the phase transition converges to the BBP limit. Numerical results corroborate the theory, demonstrating robust detection and recovery with adaptive thresholding and kernel selection, and highlighting the practical impact for efficiently recovering sparse principal components in high-dimensional data.
Abstract
This work studies estimation of sparse principal components in high dimensions. Specifically, we consider a class of estimators based on kernel PCA, generalizing the covariance thresholding algorithm proposed by Krauthgamer et al. (2015). Focusing on Johnstone's spiked covariance model, we investigate the "critical" sparsity regime, where the sparsity level $m$, sample size $n$, and dimension $p$ each diverge and $m/\sqrt{n} \rightarrow β$, $p/n \rightarrow γ$. Within this framework, we develop a fine-grained understanding of signal detection and recovery. Our results establish a detectability phase transition, analogous to the Baik--Ben Arous--Péché (BBP) transition: above a certain threshold -- depending on the kernel function, $γ$, and $β$ -- kernel PCA is informative. Conversely, below the threshold, kernel principal components are asymptotically orthogonal to the signal. Notably, above this detection threshold, we find that consistent support recovery is possible with high probability. Sparsity plays a key role in our analysis, and results in more nuanced phenomena than in related studies of kernel PCA with delocalized (dense) components. Finally, we identify optimal kernel functions for detection -- and consequently, support recovery -- and numerical calculations suggest that soft thresholding is nearly optimal.
