Robustness to Adversarial Examples through an Ensemble of Specialists
Mahdieh Abbasi, Christian Gagné
TL;DR
The paper tackles CNN vulnerability to adversarial examples by reframing robustness as open-set recognition and proposing a specialists+1 ensemble. It builds diverse class-specialist CNNs from confusion-matrix-derived subsets plus a generalist, and introduces a voting mechanism to either rely on a confident winner or activate all classifiers when uncertainty is high. Empirical results on MNIST and CIFAR-10 show reduced confidence for adversaries and effective rejection at suitable thresholds, with minimal impact on clean samples. The findings suggest robustness can be enhanced by rejection rather than aggressive reclassification, offering a practical defense against a range of adversarial attacks.
Abstract
We are proposing to use an ensemble of diverse specialists, where speciality is defined according to the confusion matrix. Indeed, we observed that for adversarial instances originating from a given class, labeling tend to be done into a small subset of (incorrect) classes. Therefore, we argue that an ensemble of specialists should be better able to identify and reject fooling instances, with a high entropy (i.e., disagreement) over the decisions in the presence of adversaries. Experimental results obtained confirm that interpretation, opening a way to make the system more robust to adversarial examples through a rejection mechanism, rather than trying to classify them properly at any cost.
