Table of Contents
Fetching ...

Sparse Representations Improve Adversarial Robustness of Neural Network Classifiers

Killian Steunou, Théo Druilhe, Sigurd Saue

TL;DR

This work investigates Sparse Principal Component Analysis (SPCA) as a data-adaptive front-end to improve adversarial robustness of neural classifiers. It provides exact robustness certificates for linear heads on SPCA features under both $ℓ_∞$ and $ℓ_2$ threat models and extends the analysis via Lipschitz bounds for general non-linear heads, showing sparsity tightens end-to-end sensitivity. Theoretical results show the certified radius grows as the dual norms $||W^T u||_1$ or $||W^T u||_2$ shrink when SPCA promotes sparsity, which is confirmed empirically on MNIST and CIFAR-binary with a small non-linear classifier. Across white-box and black-box attacks, SPCA-based models degrade more gracefully than PCA while maintaining competitive clean accuracy, suggesting sparse front-ends as a principled and practical defense against adversarial perturbations.

Abstract

Deep neural networks perform remarkably well on image classification tasks but remain vulnerable to carefully crafted adversarial perturbations. This work revisits linear dimensionality reduction as a simple, data-adapted defense. We empirically compare standard Principal Component Analysis (PCA) with its sparse variant (SPCA) as front-end feature extractors for downstream classifiers, and we complement these experiments with a theoretical analysis. On the theory side, we derive exact robustness certificates for linear heads applied to SPCA features: for both $\ell_\infty$ and $\ell_2$ threat models (binary and multiclass), the certified radius grows as the dual norms of $W^\top u$ shrink, where $W$ is the projection and $u$ the head weights. We further show that for general (non-linear) heads, sparsity reduces operator-norm bounds through a Lipschitz composition argument, predicting lower input sensitivity. Empirically, with a small non-linear network after the projection, SPCA consistently degrades more gracefully than PCA under strong white-box and black-box attacks while maintaining competitive clean accuracy. Taken together, the theory identifies the mechanism (sparser projections reduce adversarial leverage) and the experiments verify that this benefit persists beyond the linear setting. Our code is available at https://github.com/killian31/SPCARobustness.

Sparse Representations Improve Adversarial Robustness of Neural Network Classifiers

TL;DR

This work investigates Sparse Principal Component Analysis (SPCA) as a data-adaptive front-end to improve adversarial robustness of neural classifiers. It provides exact robustness certificates for linear heads on SPCA features under both and threat models and extends the analysis via Lipschitz bounds for general non-linear heads, showing sparsity tightens end-to-end sensitivity. Theoretical results show the certified radius grows as the dual norms or shrink when SPCA promotes sparsity, which is confirmed empirically on MNIST and CIFAR-binary with a small non-linear classifier. Across white-box and black-box attacks, SPCA-based models degrade more gracefully than PCA while maintaining competitive clean accuracy, suggesting sparse front-ends as a principled and practical defense against adversarial perturbations.

Abstract

Deep neural networks perform remarkably well on image classification tasks but remain vulnerable to carefully crafted adversarial perturbations. This work revisits linear dimensionality reduction as a simple, data-adapted defense. We empirically compare standard Principal Component Analysis (PCA) with its sparse variant (SPCA) as front-end feature extractors for downstream classifiers, and we complement these experiments with a theoretical analysis. On the theory side, we derive exact robustness certificates for linear heads applied to SPCA features: for both and threat models (binary and multiclass), the certified radius grows as the dual norms of shrink, where is the projection and the head weights. We further show that for general (non-linear) heads, sparsity reduces operator-norm bounds through a Lipschitz composition argument, predicting lower input sensitivity. Empirically, with a small non-linear network after the projection, SPCA consistently degrades more gracefully than PCA under strong white-box and black-box attacks while maintaining competitive clean accuracy. Taken together, the theory identifies the mechanism (sparser projections reduce adversarial leverage) and the experiments verify that this benefit persists beyond the linear setting. Our code is available at https://github.com/killian31/SPCARobustness.

Paper Structure

This paper contains 36 sections, 5 theorems, 13 equations, 4 figures.

Key Result

Theorem 1

If $m(x) > \varepsilon\,\|W^\top u\|_1$, then for all perturbations $\delta$ with $\|\delta\|_\infty \le \varepsilon$ the prediction is invariant: $f(x+\delta) = f(x)$.

Figures (4)

  • Figure 1: Classification accuracy of PCA‑ and SPCA‑based classifiers under white‑box attacks with $\ell_\infty$ perturbations. Solid lines correspond to SPCA and dashed lines to PCA. colors indicate the number of retained components (100-200).
  • Figure 2: Classification accuracy of PCA‑ and SPCA‑based classifiers under white‑box attacks with $\ell_2$ perturbations. Solid lines correspond to SPCA and dashed lines to PCA; colors indicate the number of retained components.
  • Figure 3: Classification accuracy on MNIST and CIFAR‑Binary under the Square Attack. Solid lines correspond to SPCA; dashed lines to PCA; colors denote the number of retained components.
  • Figure 4: Representative clean (left columns) and adversarial (right columns) images for MNIST and CIFAR‑Binary across increasing $\varepsilon$. Rows correspond to different attack types. We visualize examples for models with 200 principal components. Best viewed zoomed in.

Theorems & Definitions (5)

  • Theorem 1: Exact $\ell_\infty$ robustness certificate
  • Theorem 2: Exact $\ell_2$ robustness certificate
  • Theorem 3: Multiclass $\ell_\infty$ certificate
  • Theorem 4: Multiclass $\ell_2$ certificate
  • Lemma 5: Dual-norm control via column norms