Table of Contents
Fetching ...

Sparse PCA With Multiple Components

Ryan Cory-Wright, Jean Pauphilet

TL;DR

This paper tackles sparse PCA with multiple components, a problem that demands both sparsity and mutual orthogonality across several PCs. It develops three complementary approaches: (i) a semidefinite relaxation framework based on rank-constrained reformulations with strong inequalities and rounding to disjoint supports; (ii) a Lagrangian decomposition that yields tight upper bounds and a practical iterative algorithm; and (iii) a combinatorial bound via generalized Gershgorin ideas and a MILP relaxation to guide rounding. Together, these methods provide certifiably near-optimal solutions for datasets with hundreds to thousands of features and small component counts (r ∈ {2,3}), significantly outperforming deflation-based techniques which often violate orthogonality. The empirical results on UCI and synthetic data show average bound gaps around a few percent and robust accuracy in recovering sparse, orthogonal PCs while maintaining tractable computation times. Overall, the work delivers practical, near-optimal sparse PCA for multiple components with rigorous certificates, enabling interpretable, high-variance representations in high-dimensional settings.

Abstract

Sparse Principal Component Analysis (sPCA) is a cardinal technique for obtaining combinations of features, or principal components (PCs), that explain the variance of high-dimensional datasets in an interpretable manner. This involves solving a sparsity and orthogonality constrained convex maximization problem, which is extremely computationally challenging. Most existing works address sparse PCA via methods-such as iteratively computing one sparse PC and deflating the covariance matrix-that do not guarantee the orthogonality, let alone the optimality, of the resulting solution when we seek multiple mutually orthogonal PCs. We challenge this status by reformulating the orthogonality conditions as rank constraints and optimizing over the sparsity and rank constraints simultaneously. We design tight semidefinite relaxations to supply high-quality upper bounds, which we strengthen via additional second-order cone inequalities when each PC's individual sparsity is specified. Further, we derive a combinatorial upper bound on the maximum amount of variance explained as a function of the support. We exploit these relaxations and bounds to propose exact methods and rounding mechanisms that, together, obtain solutions with a bound gap on the order of 0%-15% for real-world datasets with p = 100s or 1000s of features and r \in {2, 3} components. Numerically, our algorithms match (and sometimes surpass) the best performing methods in terms of fraction of variance explained and systematically return PCs that are sparse and orthogonal. In contrast, we find that existing methods like deflation return solutions that violate the orthogonality constraints, even when the data is generated according to sparse orthogonal PCs. Altogether, our approach solves sparse PCA problems with multiple components to certifiable (near) optimality in a practically tractable fashion.

Sparse PCA With Multiple Components

TL;DR

This paper tackles sparse PCA with multiple components, a problem that demands both sparsity and mutual orthogonality across several PCs. It develops three complementary approaches: (i) a semidefinite relaxation framework based on rank-constrained reformulations with strong inequalities and rounding to disjoint supports; (ii) a Lagrangian decomposition that yields tight upper bounds and a practical iterative algorithm; and (iii) a combinatorial bound via generalized Gershgorin ideas and a MILP relaxation to guide rounding. Together, these methods provide certifiably near-optimal solutions for datasets with hundreds to thousands of features and small component counts (r ∈ {2,3}), significantly outperforming deflation-based techniques which often violate orthogonality. The empirical results on UCI and synthetic data show average bound gaps around a few percent and robust accuracy in recovering sparse, orthogonal PCs while maintaining tractable computation times. Overall, the work delivers practical, near-optimal sparse PCA for multiple components with rigorous certificates, enabling interpretable, high-variance representations in high-dimensional settings.

Abstract

Sparse Principal Component Analysis (sPCA) is a cardinal technique for obtaining combinations of features, or principal components (PCs), that explain the variance of high-dimensional datasets in an interpretable manner. This involves solving a sparsity and orthogonality constrained convex maximization problem, which is extremely computationally challenging. Most existing works address sparse PCA via methods-such as iteratively computing one sparse PC and deflating the covariance matrix-that do not guarantee the orthogonality, let alone the optimality, of the resulting solution when we seek multiple mutually orthogonal PCs. We challenge this status by reformulating the orthogonality conditions as rank constraints and optimizing over the sparsity and rank constraints simultaneously. We design tight semidefinite relaxations to supply high-quality upper bounds, which we strengthen via additional second-order cone inequalities when each PC's individual sparsity is specified. Further, we derive a combinatorial upper bound on the maximum amount of variance explained as a function of the support. We exploit these relaxations and bounds to propose exact methods and rounding mechanisms that, together, obtain solutions with a bound gap on the order of 0%-15% for real-world datasets with p = 100s or 1000s of features and r \in {2, 3} components. Numerically, our algorithms match (and sometimes surpass) the best performing methods in terms of fraction of variance explained and systematically return PCs that are sparse and orthogonal. In contrast, we find that existing methods like deflation return solutions that violate the orthogonality constraints, even when the data is generated according to sparse orthogonal PCs. Altogether, our approach solves sparse PCA problems with multiple components to certifiable (near) optimality in a practically tractable fashion.
Paper Structure (61 sections, 15 theorems, 85 equations, 4 figures, 16 tables)

This paper contains 61 sections, 15 theorems, 85 equations, 4 figures, 16 tables.

Key Result

Lemma 1

overton1992sum$\mathrm{Conv}(\mathcal{Y}_n)=\{\bm{P}: 0 \preceq \bm{P} \preceq \mathbb{I}\}$ and $\mathrm{Conv}(\mathcal{Y}_n^k)=\{\bm{P}: 0 \preceq \bm{P} \preceq \mathbb{I}, \mathrm{tr}({\bm{P}}) \leq k\}$. Moreover, the extreme points of $\mathrm{Conv}(\mathcal{Y}_n)$ are $\mathcal{Y}_n$, and the

Figures (4)

  • Figure 1: Variance explained (left panel) and feasibility violation (right panel) on synthetic instances of sparse PCA with two 20-sparse PCs with partially overlapping support. Results are averaged over 20 replications.
  • Figure 2: Accuracy (left panel) and joint support size (right panel) for the recovery of $\operatorname{supp}(\bm{x}_1) \cup \operatorname{supp}(\bm{x}_2)$, on synthetic instances of sparse PCA with two 20-sparse PCs with partially overlapping support. Results are averaged over 20 replications.
  • Figure 3: Asymmetry of sparsity budget allocation (the higher the relative KL divergence, the further away $(k_1,k_2,k_3)$ is from a symmetric allocation) vs. proportion of correlation explained on the pitprops (left panel, $k=15$, $r=3$) and ionosphere (right panel, $k=30$, $r=3$) datasets. For the proportion of correlation explained, we report both an upper bound (obtained from solving our semidefinite relaxation) and a lower bound (obtained from the solution of Algorithm 2). Note that the normalizing constant for the KL divergence is different for each dataset, as the value of $k$ is different.
  • Figure EC.1: Symmetry of sparsity budget allocation vs. proportion of correlation in the dataset explained for pitprops $k=30$ (top left), ionosphere $k=15$ (top right), geographical $k=15$ (middle left), geographical $k=30$ (middle right), communities $k=15$ (bottom left), and communities $k=30$ (bottom right). Note that we normalize the KL divergence for $k=15$ and $k=30$ separately.

Theorems & Definitions (26)

  • Lemma 1
  • Theorem 1
  • Remark 1
  • Remark 2
  • Proposition 1
  • Theorem 2
  • Proposition 2
  • Remark 3
  • Theorem 3
  • Proposition 3
  • ...and 16 more