Do algorithms and barriers for sparse principal component analysis extend to other structured settings?

Guanyi Wang; Mengqi Lou; Ashwin Pananjady

Do algorithms and barriers for sparse principal component analysis extend to other structured settings?

Guanyi Wang, Mengqi Lou, Ashwin Pananjady

TL;DR

The paper analyzes sparse and structured PCA under a spiked Wishart model where the signal lies in a union of linear subspaces. It develops a unified statistical-computational framework, deriving geometry-dependent fundamental limits and establishing a locally convergent projected power method with an exact projection oracle, plus initialization schemes. It provides end-to-end results for path- and tree-sparse PCA, including explicit convergence guarantees and hardness results via average-case reductions from secret-leakage planted clique, showing that additional structure yields only modest computational gains. Overall, the work demonstrates that the same qualitative phenomena observed in vanilla sparse PCA largely extend to structured settings, guiding algorithm design and clarifying when computational hardness persists. These insights inform both theory and practice in high-dimensional structured PCA and model-based sparse representations.

Abstract

We study a principal component analysis problem under the spiked Wishart model in which the structure in the signal is captured by a class of union-of-subspace models. This general class includes vanilla sparse PCA as well as its variants with graph sparsity. With the goal of studying these problems under a unified statistical and computational lens, we establish fundamental limits that depend on the geometry of the problem instance, and show that a natural projected power method exhibits local convergence to the statistically near-optimal neighborhood of the solution. We complement these results with end-to-end analyses of two important special cases given by path and tree sparsity in a general basis, showing initialization methods and matching evidence of computational hardness. Overall, our results indicate that several of the phenomena observed for vanilla sparse PCA extend in a natural fashion to its structured counterparts.

Do algorithms and barriers for sparse principal component analysis extend to other structured settings?

TL;DR

Abstract

Paper Structure (60 sections, 21 theorems, 180 equations, 1 figure, 4 algorithms)

This paper contains 60 sections, 21 theorems, 180 equations, 1 figure, 4 algorithms.

Introduction
Contributions and organization
Related work
Optimization algorithms for sparse PCA
Statistical and computational limits of sparse PCA
Structured PCA and related problems
Problem setting, background, and examples
Examples of union of linearly structure in Section \ref{['sec:setting-background']}
Example 1: Tree-Sparse PCA
Example 2: Path-Sparse PCA
Notation
General results
Fundamental limits of estimation
A locally convergent projected power method
Initialization method
...and 45 more sections

Key Result

Theorem 1

Suppose the union-of-linear structures condition in Definition cond:linear-structure holds. (a) Let $\widehat{\bm{v}}_{\mathsf{ES}}$ be defined in equation exhaustive-search-estimator. Without loss of generality, suppose $\langle \bm{v}_*, \widehat{\bm{v}}_{\mathsf{ES}} \rangle \geq 0$. Then for all Here, the infimum is taken over all measurable functions of the observations $\{ \bm{x}_i \}_{i = 1

Figures (1)

Figure 1: Given the sample dimension $d = 2^L - 1$, sparsity $k$, and eigengap $\lambda$, we choose a particular tree sparsity support set $T_* \in \mathcal{T}^k$ and set the ground truth vector $\bm{v}_*$ as $[\bm{v}_*]_i = \pm \frac{1}{\sqrt{k}}$ if $i \in T_*$ and $[\bm{v}_*]_i = 0$ if $i \notin T_*$. Given a tuple of $(\lambda,d,k,n)$, for each trial, we generate samples from the distribution $\mathcal{D}(\lambda, \bm{v}_*)$ based on the Wishart model in Section \ref{['sec:setting-background']}, and we run Algorithm \ref{['alg:initialization']} for initialization, and Algorithm \ref{['alg:PPM']} with general $k$-sparse projection or with tree-sparse projection for local refinement. Each trial is repeated 50 times independently. We set $\lambda = 3$ and choose $(d,k) = (255,9),(511,10),(1023,13)$. For each choice of $(d,k)$, we simulate for each $n = \{20,40,\dots,200\}$. In the first row, we plot the $\ell_2$ distance $\|\bm{v}_T - \bm{v}_*\|_2$ versus the number of samples $n$. The two curves in each panel correspond to the averaged values over 50 independent trials of the proposed methods with general $k$-sparse projection or with tree-sparse projection; the shaded parts represent the empirical standard deviations over 50 trials. As we can observe, using tree-sparse projection achieves smaller estimation error (for a given, small sample size) than using general $k$-sparse projection. In the second row, we further plot of the success probability of support recovery of the methods using general $k$-sparse projection or using tree-sparse projection verse the number of samples $n$. The support of $\bm{v}_{*}$ is considered as successfully recovered if $\mathsf{supp}(\bm{v}_T) = T_{*}$. The success probability is then computed as the ratio of the number of trials that successfully recover the support over 50 independent trials. For a fixed small sample size, we observe that using tree-sparse projection achieves higher success probability of support recovery compared with using the vanilla $k$-sparse projection.

Theorems & Definitions (33)

Definition 1
Remark 1
Definition 2
Theorem 1
Definition 3: Exact projection
Definition 4: Good region
Theorem 2
Theorem 3
Corollary 1
Corollary 2
...and 23 more

Do algorithms and barriers for sparse principal component analysis extend to other structured settings?

TL;DR

Abstract

Do algorithms and barriers for sparse principal component analysis extend to other structured settings?

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (33)