Table of Contents
Fetching ...

Feature compression is the root cause of adversarial fragility in neural network classifiers

Jingchao Gao, Ziqing Lu, Raghu Mudumbai, Xiaodong Wu, Jirong Yi, Myung Cho, Catherine Xu, Hui Xie, Weiyu Xu

TL;DR

This work tackles why neural network classifiers exhibit adversarial fragility by contrasting their worst-case perturbation tolerance with that of an optimal minimum-distance classifier. Through a matrix-theoretic framework leveraging QR decomposition and random-matrix theory, it shows that NN robustness can scale as $O(1)$ while the optimal scales as $O(\sqrt{d})$, giving a relative gap of $O(1/\sqrt{d})$ as dimension grows. The fragility is attributed to feature compression, whereby networks rely on a compressed subset of features, enabling small perturbations along those directions to flip outputs. The analysis covers linear and multi-layer nonlinear networks and extends to exponentially many data points, with extensive numerical validation on MNIST and ImageNet supporting the theory. The results unify an information-theoretic feature-compression perspective with concrete algebraic mechanisms, offering guidance for designing more robust architectures and training methods.

Abstract

In this paper, we uniquely study the adversarial robustness of deep neural networks (NN) for classification tasks against that of optimal classifiers. We look at the smallest magnitude of possible additive perturbations that can change a classifier's output. We provide a matrix-theoretic explanation of the adversarial fragility of deep neural networks for classification. In particular, our theoretical results show that a neural network's adversarial robustness can degrade as the input dimension $d$ increases. Analytically, we show that neural networks' adversarial robustness can be only $1/\sqrt{d}$ of the best possible adversarial robustness of optimal classifiers. Our theories match remarkably well with numerical experiments of practically trained NN, including NN for ImageNet images. The matrix-theoretic explanation is consistent with an earlier information-theoretic feature-compression-based explanation for the adversarial fragility of neural networks.

Feature compression is the root cause of adversarial fragility in neural network classifiers

TL;DR

This work tackles why neural network classifiers exhibit adversarial fragility by contrasting their worst-case perturbation tolerance with that of an optimal minimum-distance classifier. Through a matrix-theoretic framework leveraging QR decomposition and random-matrix theory, it shows that NN robustness can scale as while the optimal scales as , giving a relative gap of as dimension grows. The fragility is attributed to feature compression, whereby networks rely on a compressed subset of features, enabling small perturbations along those directions to flip outputs. The analysis covers linear and multi-layer nonlinear networks and extends to exponentially many data points, with extensive numerical validation on MNIST and ImageNet supporting the theory. The results unify an information-theoretic feature-compression perspective with concrete algebraic mechanisms, offering guidance for designing more robust architectures and training methods.

Abstract

In this paper, we uniquely study the adversarial robustness of deep neural networks (NN) for classification tasks against that of optimal classifiers. We look at the smallest magnitude of possible additive perturbations that can change a classifier's output. We provide a matrix-theoretic explanation of the adversarial fragility of deep neural networks for classification. In particular, our theoretical results show that a neural network's adversarial robustness can degrade as the input dimension increases. Analytically, we show that neural networks' adversarial robustness can be only of the best possible adversarial robustness of optimal classifiers. Our theories match remarkably well with numerical experiments of practically trained NN, including NN for ImageNet images. The matrix-theoretic explanation is consistent with an earlier information-theoretic feature-compression-based explanation for the adversarial fragility of neural networks.

Paper Structure

This paper contains 7 sections, 7 theorems, 50 equations, 6 figures, 8 tables.

Key Result

Theorem 1

For each class $i$, suppose that the neural network satisfies Then we have:

Figures (6)

  • Figure 1: Illustration plot of the feature compression concept.
  • Figure 2: Theoretical predictions $\phi$ (from QR decomposition) match empirical NN's $|\cos(\theta_1)|$
  • Figure 3: Clean image "7" and artificial image "1"
  • Figure 4: English springer
  • Figure 5: Afghan hound
  • ...and 1 more figures

Theorems & Definitions (7)

  • Theorem 1
  • Lemma 2
  • Lemma 3
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • Theorem 7