Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness

Stanislav Fort; Balaji Lakshminarayanan

Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness

Stanislav Fort, Balaji Lakshminarayanan

TL;DR

This work tackles adversarial brittleness in visual recognition by introducing a multi-resolution input prior and a Vickrey-inspired CrossMax self-ensembling strategy that aggregates intermediate-layer predictions. By training on channel-wise stacks of downsampled, jittered views, the approach yields higher-quality representations with intrinsic robustness, even without adversarial training. Empirical results on CIFAR-10/100 show robustness on par with or exceeding state-of-the-art methods, with further gains when combined with light adversarial training. The study also reveals interpretable adversarial perturbations and demonstrates transferable attacks on large vision-language models, emphasizing both robustness and broader implications for human-aligned perception and generative capabilities.

Abstract

Adversarial examples pose a significant challenge to the robustness, reliability and alignment of deep neural networks. We propose a novel, easy-to-use approach to achieving high-quality representations that lead to adversarial robustness through the use of multi-resolution input representations and dynamic self-ensembling of intermediate layer predictions. We demonstrate that intermediate layer predictions exhibit inherent robustness to adversarial attacks crafted to fool the full classifier, and propose a robust aggregation mechanism based on Vickrey auction that we call \textit{CrossMax} to dynamically ensemble them. By combining multi-resolution inputs and robust ensembling, we achieve significant adversarial robustness on CIFAR-10 and CIFAR-100 datasets without any adversarial training or extra data, reaching an adversarial accuracy of $\approx$72% (CIFAR-10) and $\approx$48% (CIFAR-100) on the RobustBench AutoAttack suite ($L_\infty=8/255)$ with a finetuned ImageNet-pretrained ResNet152. This represents a result comparable with the top three models on CIFAR-10 and a +5 % gain compared to the best current dedicated approach on CIFAR-100. Adding simple adversarial training on top, we get $\approx$78% on CIFAR-10 and $\approx$51% on CIFAR-100, improving SOTA by 5 % and 9 % respectively and seeing greater gains on the harder dataset. We validate our approach through extensive experiments and provide insights into the interplay between adversarial robustness, and the hierarchical nature of deep representations. We show that simple gradient-based attacks against our model lead to human-interpretable images of the target classes as well as interpretable image changes. As a byproduct, using our multi-resolution prior, we turn pre-trained classifiers and CLIP models into controllable image generators and develop successful transferable attacks on large vision language models.

Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness

TL;DR

Abstract

72% (CIFAR-10) and

48% (CIFAR-100) on the RobustBench AutoAttack suite (

with a finetuned ImageNet-pretrained ResNet152. This represents a result comparable with the top three models on CIFAR-10 and a +5 % gain compared to the best current dedicated approach on CIFAR-100. Adding simple adversarial training on top, we get

78% on CIFAR-10 and

51% on CIFAR-100, improving SOTA by 5 % and 9 % respectively and seeing greater gains on the harder dataset. We validate our approach through extensive experiments and provide insights into the interplay between adversarial robustness, and the hierarchical nature of deep representations. We show that simple gradient-based attacks against our model lead to human-interpretable images of the target classes as well as interpretable image changes. As a byproduct, using our multi-resolution prior, we turn pre-trained classifiers and CLIP models into controllable image generators and develop successful transferable attacks on large vision language models.

Paper Structure (27 sections, 25 figures, 5 tables, 1 algorithm)

This paper contains 27 sections, 25 figures, 5 tables, 1 algorithm.

Introduction
Key Observations and Techniques
The multi-resolution prior
Classifying many versions of the same object at once
Biological eye saccades
Multi-resolution input to a classifier
CrossMax robust ensembling
Robust aggregation methods, Vickrey auctions and load balancing
Only partial overlap between the adversarial susceptibility of intermediate layers
Training and Experimental Results
Model and training details
Adversarial vulnerability evaluation
Multi-resolution finetuning of a pretrained model
Adversarial finetuning
Visualizing attacks against multi-resolution models
...and 12 more sections

Figures (25)

Figure 1: We use a multi-resolution decomposition (a) of an input image and a partial decorrelation of predictions of intermediate layers (b) to build a classifier (c) that has, by default, adversarial robustness comparable or exceeding state-of-the-art (f), even without any adversarial training. Optimizing inputs against it leads to interpretable changes (d) and images generated from scratch (e).
Figure 2: Combining channel-wise stacked augmented and down-sampled versions of the input image with robust intermediate layer class predictions via CrossMax self-ensemble. The resulting model gains a considerable adversarial robustness without any adversarial training or extra data.
Figure 3:
Figure 4: The robust accuracy of different types of ensembles of 10 ResNet18 models under increasing $L_\infty$ attack strength. Our robust median ensemble, CrossMax, gives very non-trivial adversarial accuracy gains to ensembles of individually brittle models. For $L_\infty=6/255$, its CIFAR-10 robust accuracy is 17-fold larger than standard ensembling, and for CIFAR-100 the factor is 12.
Figure 5: The impact of adversarial attacks ($L_\infty = 8/255$, 128 attacks) against the full classifier on the accuracy and probabilities at all intermediate layers for an ImageNet-1k pretrained ResNet152 finetuned on CIFAR-10 via trained linear probes. The left panel shows the prediction accuracy on clean, unperturbed images, which rises from layer to layer, and the accuracy on adversarially attacked images, which is only lightly affected for all layers apart from the very last ones. These are the closest to the last layer, whose classification the attack was designed against. On the right panel, the mean predicted probability of the ground truth class and the target class of the adversary (always different from the ground truth) are shown. The target class probability only rises for the very last layers. Therefore the intermediate activations of an adversarially attacked image do not look like the target class, retaining the character of the original class instead.
...and 20 more figures

Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness

TL;DR

Abstract

Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness

Authors

TL;DR

Abstract

Table of Contents

Figures (25)