Table of Contents
Fetching ...

Sparse Principal Component Analysis with Energy Profile Dependent Sample Complexity

Mengchu Xu, Jian Wang, Yonina C. Eldar

TL;DR

The paper tackles sparse PCA in high-dimensional, sample-scarce regimes where spike energy is non-uniform. It introduces Spectral Energy Pursuit (SEP), an iterative, computationally efficient method that leverages a structure function s(p) to adapt to the spike’s energy profile, achieving a sample complexity of m ≈ max_{1≤p≤k} p s^2(p) log n in the worst case and improving toward k log n as energy concentrates. A lightweight post-processing step using a single iteration of the truncated power method with a centered operator further guarantees a uniform statistical error bound. Empirical results across flat, power-law, and exponential signals demonstrate SEP’s ability to adapt without tuning and to outperform existing approaches, especially on non-flat profiles.

Abstract

We study sparse principal component analysis in the high-dimensional, sample-limited regime, aiming to recover a leading component supported on a few coordinates. Despite extensive progress, most methods and analyses are tailored to the flat-spike case, offering little guidance when spike energy is unevenly distributed across the support. Motivated by this, we propose Spectral Energy Pursuit (SEP), an effective iterative scheme that repeatedly screens and reselects coordinates, with a sample complexity that adapts to the energy profile. We develop our framework around a structure function \(s(p)\) that quantifies how spike energy accumulates over its top \(p\) entries. We establish that SEP succeeds with a sample size of order \(\max_{1\le p\le k} p\,s^2(p)\,\log n\), which matches the classical \(k^2\log n\) sample complexity for flat spikes and improves toward the \(k\log n\) regime as the profile becomes more concentrated. As a lightweight post-processing, a single truncated power iteration is proven to enable the final estimator to attain a uniform statistical error bound. Empirical simulations across flat, power-law, and exponential signals validate that SEP adapts to profile structure without tuning and outperforms existing algorithms.

Sparse Principal Component Analysis with Energy Profile Dependent Sample Complexity

TL;DR

The paper tackles sparse PCA in high-dimensional, sample-scarce regimes where spike energy is non-uniform. It introduces Spectral Energy Pursuit (SEP), an iterative, computationally efficient method that leverages a structure function s(p) to adapt to the spike’s energy profile, achieving a sample complexity of m ≈ max_{1≤p≤k} p s^2(p) log n in the worst case and improving toward k log n as energy concentrates. A lightweight post-processing step using a single iteration of the truncated power method with a centered operator further guarantees a uniform statistical error bound. Empirical results across flat, power-law, and exponential signals demonstrate SEP’s ability to adapt without tuning and to outperform existing approaches, especially on non-flat profiles.

Abstract

We study sparse principal component analysis in the high-dimensional, sample-limited regime, aiming to recover a leading component supported on a few coordinates. Despite extensive progress, most methods and analyses are tailored to the flat-spike case, offering little guidance when spike energy is unevenly distributed across the support. Motivated by this, we propose Spectral Energy Pursuit (SEP), an effective iterative scheme that repeatedly screens and reselects coordinates, with a sample complexity that adapts to the energy profile. We develop our framework around a structure function \(s(p)\) that quantifies how spike energy accumulates over its top entries. We establish that SEP succeeds with a sample size of order \(\max_{1\le p\le k} p\,s^2(p)\,\log n\), which matches the classical sample complexity for flat spikes and improves toward the regime as the profile becomes more concentrated. As a lightweight post-processing, a single truncated power iteration is proven to enable the final estimator to attain a uniform statistical error bound. Empirical simulations across flat, power-law, and exponential signals validate that SEP adapts to profile structure without tuning and outperforms existing algorithms.

Paper Structure

This paper contains 36 sections, 14 theorems, 114 equations, 7 figures, 1 table, 2 algorithms.

Key Result

Proposition 1

There exist absolute constants $C,c>0$ such that, with probability at least $1-n^{-c}$, it holds that for every $p\in[n]$ and every index set $S\subset[n]$ with $|S|=p$. This implies $\ifstrempty{}{\mathbb{P}}{\mathbb{P}^{}}(\mathcal{E})\ge 1-n^{-c}$.

Figures (7)

  • Figure 1: Diagonal Thresholding algorithm: selects the support indices based on the largest diagonal entries of the centered covariance shift matrix $\hat{\boldsymbol{\Gamma}}$. The success condition depends on the smallest nonzero entry $v_{(k)}$.
  • Figure 2: Single-peak-based algorithm: selects the support indices based on the column of the centered covariance shift matrix $\hat{\boldsymbol{\Gamma}}$ corresponding to the largest diagonal entry. The success condition depends on the largest and smallest nonzero entries $v_{(1)}$ and $v_{(k)}$.
  • Figure 3: Spectral Energy Pursuit algorithm: at each round $p$, SEP forms the vector $\mathbf{u}^{(p)}=\hat{\boldsymbol{\Gamma}} \hat{\mathbf{e}}^{(p)}$ by multiplying the centered covariance shift matrix $\hat{\boldsymbol{\Gamma}}$ with the vector $\hat{\mathbf{e}}^{(p)}$, the top eigenvector of $\hat{\boldsymbol{\Gamma}}_{S^{(p)}, S^{(p)}}$. The next support estimate $S^{(p+1)}$ is obtained by selecting the top-$(p+1)$ entries of $\mathbf{u}^{(p)}$. The success condition depends on the signal energy structure function $s(p)$.
  • Figure 4: $s(p)$ for $k=40$ under the three profiles. The power-law and exponential curves start lower at small $p$, reflecting stronger early concentration.
  • Figure 5: Direction error vs $m$ across three profiles (curves: trial mean; shaded bands: $\pm1$ standard deviation over $1000$ trials).
  • ...and 2 more figures

Theorems & Definitions (29)

  • Definition 1: Signal-energy structure function
  • Definition 2: Direction metric
  • Proposition 1: Principal-submatrix spectral bound
  • Theorem 1: Profile-adaptive sample complexity for direction estimation
  • Proposition 2: Power-law signal profiles: interpolation between flat and concentrated regimes
  • Theorem 2: TPower after $T$ iterations: uniform statistical upper bound
  • Theorem 3: Superiority of SEP sample complexity
  • Proposition 3: Initialization
  • Proposition 4: Inductive step: energy lower bound preservation
  • Proposition 5: Initializer alignment lower bound
  • ...and 19 more