Table of Contents
Fetching ...

Excessive Invariance Causes Adversarial Vulnerability

Jörn-Henrik Jacobsen, Jens Behrmann, Richard Zemel, Matthias Bethge

TL;DR

The paper reframes adversarial vulnerability as arising not only from sensitivity to perturbations but also from excessive invariance to semantically meaningful input changes. By using fully invertible networks, it exposes how class-relevant content can be manipulated without altering activations, and it links this phenomenon to an information-theoretic inefficiency of cross-entropy. The authors introduce Independence Cross-Entropy (iCE), which jointly optimizes semantic information while limiting nuisance information via a nuisance classifier (plus a maximum-likelihood term), and they demonstrate reduced invariance-based vulnerabilities on MNIST, ImageNet, and shiftMNIST benchmarks. This work provides a principled path to improving robustness under unrestricted distribution shifts through architectural access to decision spaces and a bias-aware objective that encourages comprehensive explanation of task-relevant variability.

Abstract

Despite their impressive performance, deep neural networks exhibit striking failures on out-of-distribution inputs. One core idea of adversarial example research is to reveal neural network errors under such distribution shifts. We decompose these errors into two complementary sources: sensitivity and invariance. We show deep networks are not only too sensitive to task-irrelevant changes of their input, as is well-known from epsilon-adversarial examples, but are also too invariant to a wide range of task-relevant changes, thus making vast regions in input space vulnerable to adversarial attacks. We show such excessive invariance occurs across various tasks and architecture types. On MNIST and ImageNet one can manipulate the class-specific content of almost any image without changing the hidden activations. We identify an insufficiency of the standard cross-entropy loss as a reason for these failures. Further, we extend this objective based on an information-theoretic analysis so it encourages the model to consider all task-dependent features in its decision. This provides the first approach tailored explicitly to overcome excessive invariance and resulting vulnerabilities.

Excessive Invariance Causes Adversarial Vulnerability

TL;DR

The paper reframes adversarial vulnerability as arising not only from sensitivity to perturbations but also from excessive invariance to semantically meaningful input changes. By using fully invertible networks, it exposes how class-relevant content can be manipulated without altering activations, and it links this phenomenon to an information-theoretic inefficiency of cross-entropy. The authors introduce Independence Cross-Entropy (iCE), which jointly optimizes semantic information while limiting nuisance information via a nuisance classifier (plus a maximum-likelihood term), and they demonstrate reduced invariance-based vulnerabilities on MNIST, ImageNet, and shiftMNIST benchmarks. This work provides a principled path to improving robustness under unrestricted distribution shifts through architectural access to decision spaces and a bias-aware objective that encourages comprehensive explanation of task-relevant variability.

Abstract

Despite their impressive performance, deep neural networks exhibit striking failures on out-of-distribution inputs. One core idea of adversarial example research is to reveal neural network errors under such distribution shifts. We decompose these errors into two complementary sources: sensitivity and invariance. We show deep networks are not only too sensitive to task-irrelevant changes of their input, as is well-known from epsilon-adversarial examples, but are also too invariant to a wide range of task-relevant changes, thus making vast regions in input space vulnerable to adversarial attacks. We show such excessive invariance occurs across various tasks and architecture types. On MNIST and ImageNet one can manipulate the class-specific content of almost any image without changing the hidden activations. We identify an insufficiency of the standard cross-entropy loss as a reason for these failures. Further, we extend this objective based on an information-theoretic analysis so it encourages the model to consider all task-dependent features in its decision. This provides the first approach tailored explicitly to overcome excessive invariance and resulting vulnerabilities.

Paper Structure

This paper contains 18 sections, 4 theorems, 17 equations, 9 figures, 2 tables.

Key Result

Theorem 6

Let $\mathcal{D}_{Adv}$ denote the adversarial distribution and $\mathcal{D}$ the training distribution. Assume $I_{\mathcal{D}}(y; z_n) = 0$ by minimizing $\mathcal{L}_{iCE}$ and the distribution shift satisfies $I_{\mathcal{D}_{Adv}}(z_n; y) \leq I_{\mathcal{D}}(z_n; y)$ and $I_{\mathcal{D}_{Adv}}

Figures (9)

  • Figure 1: All images shown cause a competitive ImageNet-trained network to output the exact same probabilities over all 1000 classes (logits shown above each image). The leftmost image is from the ImageNet validation set; all other images are constructed such that they match the non-class related information of images taken from other classes (for details see section \ref{['metamericsampling']}). The excessive invariance revealed by this set of adversarial examples demonstrates that the logits contain only a small fraction of the information perceptually relevant to humans for discrimination between the classes.
  • Figure 2: Connection between (1) invariance-based (long pink arrow) and (2) perturbation-based adversarial examples (short orange arrow). Class distributions are shown in green and blue; dashed line is the decision-boundary of a classifier. All adversarial examples can be reached either by crossing the decision-boundary of the classifier via perturbations, or by moving within the pre-image of the classifier to mis-classified regions. The two viewpoints are complementary to one another and highlight that adversarial vulnerability is not only caused by excessive sensitivity to semantically meaningless perturbations, but also by excessive insensitivity to semantically meaningful transformations.
  • Figure 3: The fully invertible RevNet, a hybrid of Glow and iRevNet with simple readout structure. $z_s$ represents the logits and $z_n$ the nuisance.
  • Figure 4: Left: Decision-boundaries in 2D subspace spanned by two random data points $x_1,x_2$. Right: Decision-boundaries in 2D subspace spanned by random datapoint $x$ and metamer $x_{met}$.
  • Figure 5: Each column shows three images belonging together. Top row are source images from which we sample the logits, middle row are logit metamers and bottom row images from which we sample the nuisances. Top row and middle row have the same (approximately for ResNets, exactly for fully invertible RevNets) logit activations. Thus, it is possible to change the image content completely without changing the 10- and 1000-dimensional logit vectors respectively. This highlights a striking failure of classifiers to capture all task-dependent variability.
  • ...and 4 more figures

Theorems & Definitions (14)

  • Definition 1: Pre-images / Invariance
  • Definition 2: Perturbation-based Adversarial Examples
  • Definition 3: Invariance-based Adversarial Examples
  • Definition 4: Semantic/ Nuisance perturbation of an input
  • Definition 5: Independence cross-entropy loss
  • Theorem 6: Information $\boldsymbol{I_{\mathcal{D}_{Adv}}(y; z_s)}$ maximal after distribution shift
  • Example 7: Semantic and nuisance on Adversarial Spheres gilmer2018adversarial
  • Example 8: Mis-aligned classifier on Adversarial Spheres
  • Lemma 9: Variational lower bound on mutual information
  • Lemma 10: Effect of nuisance classifier
  • ...and 4 more