Table of Contents
Fetching ...

Sparse Principal Component Analysis with Non-Oblivious Adversarial Perturbations

Yuqing He, Guanyi Wang, Yu Yang

Abstract

Sparse Principal Component Analysis (sparse PCA) is a fundamental dimension-reduction tool that enhances interpretability in various high-dimensional settings. An important variant of sparse PCA studies the scenario when samples are adversarially perturbed. Notably, most existing statistical studies on this variant focus on recovering the ground truth and verifying the robustness of classical algorithms when the given samples are corrupted under oblivious adversarial perturbations. In contrast, this paper aims to find a robust sparse principal component that maximizes the variance of the given samples corrupted by non-oblivious adversarial perturbations, say sparse PCA with Non-Oblivious Adversarial Perturbations (sparse PCA-NOAP). Specifically, we introduce a general formulation for the proposed sparse PCA-NOAP. We then derive Mixed-Integer Programming (MIP) reformulations to upper bound it with provable worst-case guarantees when adversarial perturbations are controlled by two typical norms, i.e., $\ell_{2 \rightarrow \infty}$-norm (sample-wise $\ell_2$-norm perturbation) and $\ell_{1 \rightarrow 2}$-norm (feature-wise $\ell_2$-norm perturbation). Moreover, when samples are drawn from the spiked Wishart model, we show that the proposed MIP reformulations ensure vector recovery properties under a more general parameter region compared with existing results. Numerical simulations are also provided to validate the theoretical findings and demonstrate the accuracy of the proposed formulations.

Sparse Principal Component Analysis with Non-Oblivious Adversarial Perturbations

Abstract

Sparse Principal Component Analysis (sparse PCA) is a fundamental dimension-reduction tool that enhances interpretability in various high-dimensional settings. An important variant of sparse PCA studies the scenario when samples are adversarially perturbed. Notably, most existing statistical studies on this variant focus on recovering the ground truth and verifying the robustness of classical algorithms when the given samples are corrupted under oblivious adversarial perturbations. In contrast, this paper aims to find a robust sparse principal component that maximizes the variance of the given samples corrupted by non-oblivious adversarial perturbations, say sparse PCA with Non-Oblivious Adversarial Perturbations (sparse PCA-NOAP). Specifically, we introduce a general formulation for the proposed sparse PCA-NOAP. We then derive Mixed-Integer Programming (MIP) reformulations to upper bound it with provable worst-case guarantees when adversarial perturbations are controlled by two typical norms, i.e., -norm (sample-wise -norm perturbation) and -norm (feature-wise -norm perturbation). Moreover, when samples are drawn from the spiked Wishart model, we show that the proposed MIP reformulations ensure vector recovery properties under a more general parameter region compared with existing results. Numerical simulations are also provided to validate the theoretical findings and demonstrate the accuracy of the proposed formulations.

Paper Structure

This paper contains 32 sections, 13 theorems, 85 equations, 2 figures, 2 algorithms.

Key Result

Proposition 1

For any $\bm{v} \in \mathcal{V}_k$, the inner minimization problem $\min_{\|\bm{E}\| \leq \rho}$$\frac{1}{n} \|\bm{X} \bm{v} + \bm{E}\bm{v}\|_2^2$ can be written as $\frac{1}{n} \| \bm{X}\bm{v} - {\tt Proj}_{\mathcal{F}(\bm{v}; \|\cdot\|, \rho)} (\bm{X} \bm{v}) \|_2^2$, where set $\mathcal{F}(\bm{v}

Figures (2)

  • Figure 1: Numerical results to validate Theorem \ref{['thm:strong-weak-signal']}. We plot the cosine values of angles between true and computed solution for strong and weak signal part v.s. the normalized perturbation parameter $\bar{\rho}$ under a particular parameter setting $(\bar{d},k,n,\lambda,N,r)$. The first and second row reports the numerical results with $n = 100$ and $n = 500$, respectively. For a specific normalized perturbation parameter $\bar{\rho}$, we generate 10 independent sample matrices based on the above parameter setting. Over these 10 independent trials, the solid curve in each panel corresponds to the averaged cosine values of the strong signal part; similarly, the dashed curve denotes the averaged cosine values of the weak signal part; the shaded parts represent the empirical standard deviations. The second/third/fourth column denotes the cosine values with respect to corresponding methods. The first column, denoted by " best", represents the cosine values for the solutions with the best objective value of \ref{['eq:pop-featurewise-spca']} among all three methods. We can observe that: when $\bar{\rho}$ is relatively small, both strong and weak parts can be recovered with large cosine values, which follows the recovery stage described in Theorem \ref{['thm:strong-weak-signal']}; while $\bar{\rho}$ increases, the weak signal part dramatically drops to zero and the strong signal part stays relatively higher cosine value as described in the robust stage in Theorem \ref{['thm:strong-weak-signal']}; finally, if $\bar{\rho}$ grows beyond some threshold, adversarial perturbation are too large to capture any information of the ground truth.
  • Figure 2: Numerical simulations on strong and weak signals. This figure compares the gap ratio, objective function value, and support recovery rate v.s. the normalized perturbation parameter $\bar{\rho}$ of PPM and two proposed MIP formulations under two different parameter settings as mentioned in Section \ref{['sec:numerical']}. The first and second rows consist of panels for $n = 100$ and $n = 500$, respectively. In the first column, we plot the gap ratio ( gap) v.s. the normalized perturbation parameter $\bar{\rho}$. The three solid curves in each panel correspond to the averaged values over 10 independent trials of corresponding methods; the shaded parts represent the empirical standard deviations over 10 independent trials. Smaller gap means that the upper bound is tighter. It is easy to observe that the proposed MIP method outperforms the other two methods, which validates our theoretical analysis. The gap of the proposed MIP method is close to 0, indicating that the MIP formulation is a very tight upper bound for the problem. The second column plots the objective function values ( obj) of \ref{['eq:emp-adv-spca-2']} v.s. the normalized perturbation parameter $\bar{\rho}$. Similarly, the four solid curves in each panel correspond to the averaged values over 10 independent trials of corresponding methods; and their shaded parts represent the empirical standard deviations over 10 independent trials. As we can observe, the proposed MIP method performs best when the perturbation is large, indicating that it is a more resilient method for finding the optimal solution. The third column plots the support recovery rate v.s. the normalized perturbation parameter $\bar{\rho}$. In each panel, the three solid curves represent the averaged strong signal recovery rate over 10 independent trials for each method, while the three dashed curves depict the averaged weak signal recovery rate over 10 independent trials for each method. The shaded areas indicate empirical standard deviations across these trials. Our findings show that both the strong and weak signal recovery rates of PPM consistently outperform the two proposed methods, suggesting that PPM could be an effective approach for recovering the ground truth in this problem setting. A likely explanation for PPM's strong performance lies in its strategy of retaining the $k$ largest indices of the gradient at each iteration. Given that the gradient of \ref{['eq:emp-adv-spca-2']} is $2 \left(\|\bm{X} \bm{v}\|_2 - \rho \|\bm{v}\|_1 \right) \left(\frac{\bm{X}^{\top} \bm{X} \bm{v}}{\|\bm{X} \bm{v}\|_2} - \rho \tt{sgn} \left(\bm{v} \right) \right)$, the term $\rho \tt{sgn} \left(\bm{v} \right)$ becomes dominant when $\rho$ is large. Consequently, the support at each iteration tends to align with that of the previous iteration, often leading to a final computed support equal to the initial support, i.e., the support of $\bm{v}^{{\tt spca}}$, which corresponds to the true support with high probability. This observation explains why PPM can retain the true support even under high perturbations.

Theorems & Definitions (15)

  • Proposition 1
  • Proposition 2
  • Theorem 1
  • Proposition 3
  • Proposition 4
  • Theorem 2
  • Theorem 3
  • Remark 1
  • Remark 2
  • Proposition 5
  • ...and 5 more