Evaluating Robustness to Context-Sensitive Feature Perturbations of Different Granularities
Isaac Dunn, Laura Hanu, Hadrien Pouget, Daniel Kroening, Tom Melham
TL;DR
This paper tackles the problem of robustness under deployment-time distributional shifts by introducing context-sensitive, granularity-aware perturbations crafted through latent activations of pretrained generators. By perturbing activations across generator layers, the authors simulate both coarse and fine-grained semantic changes and evaluate state-of-the-art ImageNet classifiers, revealing widespread vulnerability to such perturbations. A key finding is that adversarial training against pixel-space perturbations can degrade robustness to coarse-grained, context-sensitive perturbations, highlighting a trade-off and the need for broader robustness frameworks. The work demonstrates generalizability across datasets (MNIST, CelebA-HQ) and generators, emphasizing the necessity of evaluation beyond pixel-space perturbations to better understand and improve real-world model reliability.
Abstract
We cannot guarantee that training datasets are representative of the distribution of inputs that will be encountered during deployment. So we must have confidence that our models do not over-rely on this assumption. To this end, we introduce a new method that identifies context-sensitive feature perturbations (e.g. shape, location, texture, colour) to the inputs of image classifiers. We produce these changes by performing small adjustments to the activation values of different layers of a trained generative neural network. Perturbing at layers earlier in the generator causes changes to coarser-grained features; perturbations further on cause finer-grained changes. Unsurprisingly, we find that state-of-the-art classifiers are not robust to any such changes. More surprisingly, when it comes to coarse-grained feature changes, we find that adversarial training against pixel-space perturbations is not just unhelpful: it is counterproductive.
