Sparse Representations Improve Adversarial Robustness of Neural Network Classifiers

Killian Steunou; Théo Druilhe; Sigurd Saue

Sparse Representations Improve Adversarial Robustness of Neural Network Classifiers

Killian Steunou, Théo Druilhe, Sigurd Saue

TL;DR

This work investigates Sparse Principal Component Analysis (SPCA) as a data-adaptive front-end to improve adversarial robustness of neural classifiers. It provides exact robustness certificates for linear heads on SPCA features under both $ℓ_∞$ and $ℓ_2$ threat models and extends the analysis via Lipschitz bounds for general non-linear heads, showing sparsity tightens end-to-end sensitivity. Theoretical results show the certified radius grows as the dual norms $||W^T u||_1$ or $||W^T u||_2$ shrink when SPCA promotes sparsity, which is confirmed empirically on MNIST and CIFAR-binary with a small non-linear classifier. Across white-box and black-box attacks, SPCA-based models degrade more gracefully than PCA while maintaining competitive clean accuracy, suggesting sparse front-ends as a principled and practical defense against adversarial perturbations.

Abstract

Deep neural networks perform remarkably well on image classification tasks but remain vulnerable to carefully crafted adversarial perturbations. This work revisits linear dimensionality reduction as a simple, data-adapted defense. We empirically compare standard Principal Component Analysis (PCA) with its sparse variant (SPCA) as front-end feature extractors for downstream classifiers, and we complement these experiments with a theoretical analysis. On the theory side, we derive exact robustness certificates for linear heads applied to SPCA features: for both $\ell_\infty$ and $\ell_2$ threat models (binary and multiclass), the certified radius grows as the dual norms of $W^\top u$ shrink, where $W$ is the projection and $u$ the head weights. We further show that for general (non-linear) heads, sparsity reduces operator-norm bounds through a Lipschitz composition argument, predicting lower input sensitivity. Empirically, with a small non-linear network after the projection, SPCA consistently degrades more gracefully than PCA under strong white-box and black-box attacks while maintaining competitive clean accuracy. Taken together, the theory identifies the mechanism (sparser projections reduce adversarial leverage) and the experiments verify that this benefit persists beyond the linear setting. Our code is available at https://github.com/killian31/SPCARobustness.

Sparse Representations Improve Adversarial Robustness of Neural Network Classifiers

TL;DR

and

threat models and extends the analysis via Lipschitz bounds for general non-linear heads, showing sparsity tightens end-to-end sensitivity. Theoretical results show the certified radius grows as the dual norms

shrink when SPCA promotes sparsity, which is confirmed empirically on MNIST and CIFAR-binary with a small non-linear classifier. Across white-box and black-box attacks, SPCA-based models degrade more gracefully than PCA while maintaining competitive clean accuracy, suggesting sparse front-ends as a principled and practical defense against adversarial perturbations.

Abstract

and

threat models (binary and multiclass), the certified radius grows as the dual norms of

shrink, where

is the projection and

the head weights. We further show that for general (non-linear) heads, sparsity reduces operator-norm bounds through a Lipschitz composition argument, predicting lower input sensitivity. Empirically, with a small non-linear network after the projection, SPCA consistently degrades more gracefully than PCA under strong white-box and black-box attacks while maintaining competitive clean accuracy. Taken together, the theory identifies the mechanism (sparser projections reduce adversarial leverage) and the experiments verify that this benefit persists beyond the linear setting. Our code is available at https://github.com/killian31/SPCARobustness.

Sparse Representations Improve Adversarial Robustness of Neural Network Classifiers

TL;DR

Abstract

Sparse Representations Improve Adversarial Robustness of Neural Network Classifiers

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (5)