Table of Contents
Fetching ...

Sparse PCA: A New Scalable Estimator Based On Integer Programming

Kayhan Behdin, Rahul Mazumder

TL;DR

A new estimator for SPCA which can be formulated as a Mixed Integer Program (MIP) and derive guarantees under departures from the spiked covariance model, and for approximate solutions to the optimization problem.

Abstract

We consider the Sparse Principal Component Analysis (SPCA) problem under the well-known spiked covariance model. Recent work has shown that the SPCA problem can be reformulated as a Mixed Integer Program (MIP) and can be solved to global optimality, leading to estimators that are known to enjoy optimal statistical properties. However, prior MIP algorithms for SPCA appear to be limited in terms of scalability to up to a thousand features or so. In this paper, we propose a new estimator for SPCA which can be formulated as a MIP. Different from earlier work, we make use of the underlying spiked covariance model and properties of the multivariate Gaussian distribution to arrive at our estimator. We establish statistical guarantees for our proposed estimator in terms of estimation error and support recovery. We derive guarantees under departures from the spiked covariance model, and for approximate solutions to the optimization problem. We propose a custom algorithm to solve the MIP, which scales better than off-the-shelf solvers, and demonstrate that our approach can be much more computationally attractive compared to earlier exact MIP-based approaches for the SPCA problem. Our numerical experiments on synthetic and real datasets show that our algorithms can address problems with up to 20,000 features in minutes; and generally result in favorable statistical properties compared to existing popular approaches for SPCA.

Sparse PCA: A New Scalable Estimator Based On Integer Programming

TL;DR

A new estimator for SPCA which can be formulated as a Mixed Integer Program (MIP) and derive guarantees under departures from the spiked covariance model, and for approximate solutions to the optimization problem.

Abstract

We consider the Sparse Principal Component Analysis (SPCA) problem under the well-known spiked covariance model. Recent work has shown that the SPCA problem can be reformulated as a Mixed Integer Program (MIP) and can be solved to global optimality, leading to estimators that are known to enjoy optimal statistical properties. However, prior MIP algorithms for SPCA appear to be limited in terms of scalability to up to a thousand features or so. In this paper, we propose a new estimator for SPCA which can be formulated as a MIP. Different from earlier work, we make use of the underlying spiked covariance model and properties of the multivariate Gaussian distribution to arrive at our estimator. We establish statistical guarantees for our proposed estimator in terms of estimation error and support recovery. We derive guarantees under departures from the spiked covariance model, and for approximate solutions to the optimization problem. We propose a custom algorithm to solve the MIP, which scales better than off-the-shelf solvers, and demonstrate that our approach can be much more computationally attractive compared to earlier exact MIP-based approaches for the SPCA problem. Our numerical experiments on synthetic and real datasets show that our algorithms can address problems with up to 20,000 features in minutes; and generally result in favorable statistical properties compared to existing popular approaches for SPCA.

Paper Structure

This paper contains 40 sections, 25 theorems, 241 equations, 9 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

Let $x\in \mathbb{R}^p$ be a random vector from $\mathcal{N}(0,G)$ where $G$ is a positive definite matrix. For any $j \in [p]$, the conditional distribution of $x_{j}$, given $x_{i}, i \neq j$ is given by where, $\beta^*_{i,j}=-\frac{(G^{-1})_{j,i}}{(G^{-1})_{j,j}}$ and $({\sigma_j^*})^2=\frac{1}{(G^{-1})_{j,j}}$.

Figures (9)

  • Figure 1: Estimation and support recovery error as available from different methods on synthetic data with $p=10000$ and different values of $n$ (along the x-axis), as discussed in the text (Section \ref{['varyingn']}). The left panels show results with $s=5$ and right panels $s=10$. The top panels compare estimation performance and bottom ones compare support recovery. Our proposed approach results in high-quality estimation performance, and perfect support recovery.
  • Figure 2: Numerical results for the synthetic dataset with $p=10000,n=7500$ in Section \ref{['varyings']}. The left panel shows the estimation performance and the right panel shows the support recovery performance.
  • Figure 3: Numerical results for the synthetic dataset with $p=20000$ in Section \ref{['miqpvsmisocp']}. The left panel shows the statistical performance for $s=5$. The right panel shows the statistical performance for $s=10$.
  • Figure 4: Comparison of estimated PCs by different algorithms for the real dataset in Section \ref{['real']}. The left panel shows the results for $s=4$ and the right panel shows the results for $s=5$. In both cases, SPCA-SLS reaches the optimality gap of less than $12\%$ and SPCA-SLSR reaches the optimality gap of less than $10\%$ after 10 minutes.
  • Figure B.1: Experiment results for the first setup in Appendix \ref{['app:add-exp']}
  • ...and 4 more figures

Theorems & Definitions (54)

  • Lemma 1
  • Lemma 2
  • Remark 1
  • Remark 2
  • Remark 3
  • Theorem 1
  • Theorem 2
  • Corollary 1
  • Remark 4
  • Theorem 3
  • ...and 44 more