Table of Contents
Fetching ...

Robustness to Adversarial Examples through an Ensemble of Specialists

Mahdieh Abbasi, Christian Gagné

TL;DR

The paper tackles CNN vulnerability to adversarial examples by reframing robustness as open-set recognition and proposing a specialists+1 ensemble. It builds diverse class-specialist CNNs from confusion-matrix-derived subsets plus a generalist, and introduces a voting mechanism to either rely on a confident winner or activate all classifiers when uncertainty is high. Empirical results on MNIST and CIFAR-10 show reduced confidence for adversaries and effective rejection at suitable thresholds, with minimal impact on clean samples. The findings suggest robustness can be enhanced by rejection rather than aggressive reclassification, offering a practical defense against a range of adversarial attacks.

Abstract

We are proposing to use an ensemble of diverse specialists, where speciality is defined according to the confusion matrix. Indeed, we observed that for adversarial instances originating from a given class, labeling tend to be done into a small subset of (incorrect) classes. Therefore, we argue that an ensemble of specialists should be better able to identify and reject fooling instances, with a high entropy (i.e., disagreement) over the decisions in the presence of adversaries. Experimental results obtained confirm that interpretation, opening a way to make the system more robust to adversarial examples through a rejection mechanism, rather than trying to classify them properly at any cost.

Robustness to Adversarial Examples through an Ensemble of Specialists

TL;DR

The paper tackles CNN vulnerability to adversarial examples by reframing robustness as open-set recognition and proposing a specialists+1 ensemble. It builds diverse class-specialist CNNs from confusion-matrix-derived subsets plus a generalist, and introduces a voting mechanism to either rely on a confident winner or activate all classifiers when uncertainty is high. Empirical results on MNIST and CIFAR-10 show reduced confidence for adversaries and effective rejection at suitable thresholds, with minimal impact on clean samples. The findings suggest robustness can be enhanced by rejection rather than aggressive reclassification, offering a practical defense against a range of adversarial attacks.

Abstract

We are proposing to use an ensemble of diverse specialists, where speciality is defined according to the confusion matrix. Indeed, we observed that for adversarial instances originating from a given class, labeling tend to be done into a small subset of (incorrect) classes. Therefore, we argue that an ensemble of specialists should be better able to identify and reject fooling instances, with a high entropy (i.e., disagreement) over the decisions in the presence of adversaries. Experimental results obtained confirm that interpretation, opening a way to make the system more robust to adversarial examples through a rejection mechanism, rather than trying to classify them properly at any cost.

Paper Structure

This paper contains 14 sections, 3 equations, 7 figures, 1 algorithm.

Figures (7)

  • Figure 1: Confusion matrices of adversaries for (a) MNIST and (b) CIFAR-10. These matrices have been computed from 5000 randomly selected FGS training adversaries (500 per class).
  • Figure 2: Confidence densities on MNIST: (a) naive CNN*, (b) pure ensemble, and (c) specialists+1.
  • Figure 3: Confidence densities on CIFAR-10: (a) naive CNN*, (b) pure ensemble, and (c) specialists+1.
  • Figure 4: Average distortion to MNIST and CIFAR-10 samples by FGS, DF, and szegedy2013intriguing. The average misclassification confidences are shown by blue text.
  • Figure 5: Error rates $E_{D}$ on clean test samples, and error rates $E_{A}$ on their corresponding adversaries, as a function of threshold ($\tau$), for the MNIST and CIFAR-10 datasets.
  • ...and 2 more figures