Table of Contents
Fetching ...

Evaluating Robustness to Context-Sensitive Feature Perturbations of Different Granularities

Isaac Dunn, Laura Hanu, Hadrien Pouget, Daniel Kroening, Tom Melham

TL;DR

This paper tackles the problem of robustness under deployment-time distributional shifts by introducing context-sensitive, granularity-aware perturbations crafted through latent activations of pretrained generators. By perturbing activations across generator layers, the authors simulate both coarse and fine-grained semantic changes and evaluate state-of-the-art ImageNet classifiers, revealing widespread vulnerability to such perturbations. A key finding is that adversarial training against pixel-space perturbations can degrade robustness to coarse-grained, context-sensitive perturbations, highlighting a trade-off and the need for broader robustness frameworks. The work demonstrates generalizability across datasets (MNIST, CelebA-HQ) and generators, emphasizing the necessity of evaluation beyond pixel-space perturbations to better understand and improve real-world model reliability.

Abstract

We cannot guarantee that training datasets are representative of the distribution of inputs that will be encountered during deployment. So we must have confidence that our models do not over-rely on this assumption. To this end, we introduce a new method that identifies context-sensitive feature perturbations (e.g. shape, location, texture, colour) to the inputs of image classifiers. We produce these changes by performing small adjustments to the activation values of different layers of a trained generative neural network. Perturbing at layers earlier in the generator causes changes to coarser-grained features; perturbations further on cause finer-grained changes. Unsurprisingly, we find that state-of-the-art classifiers are not robust to any such changes. More surprisingly, when it comes to coarse-grained feature changes, we find that adversarial training against pixel-space perturbations is not just unhelpful: it is counterproductive.

Evaluating Robustness to Context-Sensitive Feature Perturbations of Different Granularities

TL;DR

This paper tackles the problem of robustness under deployment-time distributional shifts by introducing context-sensitive, granularity-aware perturbations crafted through latent activations of pretrained generators. By perturbing activations across generator layers, the authors simulate both coarse and fine-grained semantic changes and evaluate state-of-the-art ImageNet classifiers, revealing widespread vulnerability to such perturbations. A key finding is that adversarial training against pixel-space perturbations can degrade robustness to coarse-grained, context-sensitive perturbations, highlighting a trade-off and the need for broader robustness frameworks. The work demonstrates generalizability across datasets (MNIST, CelebA-HQ) and generators, emphasizing the necessity of evaluation beyond pixel-space perturbations to better understand and improve real-world model reliability.

Abstract

We cannot guarantee that training datasets are representative of the distribution of inputs that will be encountered during deployment. So we must have confidence that our models do not over-rely on this assumption. To this end, we introduce a new method that identifies context-sensitive feature perturbations (e.g. shape, location, texture, colour) to the inputs of image classifiers. We produce these changes by performing small adjustments to the activation values of different layers of a trained generative neural network. Perturbing at layers earlier in the generator causes changes to coarser-grained features; perturbations further on cause finer-grained changes. Unsurprisingly, we find that state-of-the-art classifiers are not robust to any such changes. More surprisingly, when it comes to coarse-grained feature changes, we find that adversarial training against pixel-space perturbations is not just unhelpful: it is counterproductive.

Paper Structure

This paper contains 45 sections, 16 figures, 6 tables.

Figures (16)

  • Figure 1: An example of changing the computed classification from 'volcano' to target label 'goldfish' using context-sensitive feature perturbations of all granularities. Coarser-grained changes include darkening the sky, causing an eruption of lava, and adding a rocky outcrop in the foreground; finer-grained changes include slightly flattening the curve of the volcano, and adjustments to the texture of the trees, rocks and cloud.
  • Figure 2: Illustration of a forward pass with perturbations to the latent activation values at $n$ layers in the generator network.
  • Figure 3: Context-sensitive feature perturbations at different granularities, as controlled by perturbing activations at the generator layers indicated under each image. Differences with the unperturbed image are shown above each perturbed image. The perturbed Pomeranians (dogs) are classified as ' red king crabs', the volcanos as ' goldfish', and redshanks (birds) as ' rams'.
  • Figure 4: Graphs showing how the cumulative proportion of perturbations that induce the targeted misclassification increases with maximum perturbation magnitude. The steeper the line, the less robust the classifier to that perturbation type. The lines and translucent areas shown are the means and standard deviations between several experiments of 30 images each.
  • Figure 5: Screenshot of labelling interface. The perturbed image and buttons, on the right-hand side, are visible only when the unperturbed image (on the left) has been selected as matching the desired label. The buttons are numbered to provide keyboard shortcuts. The button at the bottom opens a web image search, in case the user is unfamiliar with the class label.
  • ...and 11 more figures