Adversarial Detection by Approximation of Ensemble Boundary

T. Windeatt

Adversarial Detection by Approximation of Ensemble Boundary

T. Windeatt

TL;DR

Deep neural networks for object recognition are vulnerable to adversarial perturbations that humans barely notice. The authors propose an ensemble of binary DNN classifiers combined with Walsh (Rademacher-Walsh) coefficients to approximate the joint decision boundary between clean and adversarial images, with the boundary complexity controlled by the order $W_y$. They show that higher-order Walsh coefficients capture boundary curvature changes induced by adversarial patterns and that the detector using $W_2$ and the difference $W_1/W_2$ achieves separation between clean and adversarial samples across multiple datasets and architectures; they also release code and details of the architectures. The work offers a principled, interpretable framework for adversarial detection via boundary modeling, while acknowledging limitations (two-class, image-domain) and the need for evaluating adaptive attacks and multi-class extensions such as ECOC.

Abstract

Despite being effective in many application areas, Deep Neural Networks (DNNs) are vulnerable to being attacked. In object recognition, the attack takes the form of a small perturbation added to an image, that causes the DNN to misclassify, but to a human appears no different. Adversarial attacks lead to defences that are themselves subject to attack, and the attack/ defence strategies provide important information about the properties of DNNs. In this paper, a novel method of detecting adversarial attacks is proposed for an ensemble of Deep Neural Networks (DNNs) solving two-class pattern recognition problems. The ensemble is combined using Walsh coefficients which are capable of approximating Boolean functions and thereby controlling the decision boundary complexity. The hypothesis in this paper is that decision boundaries with high curvature allow adversarial perturbations to be found, but change the curvature of the decision boundary, which is then approximated in a different way by Walsh coefficients compared to the clean images. Besides controlling boundary complexity, the coefficients also measure the correlation with class labels, which may aid in understanding the learning and transferability properties of DNNs. While the experiments here use images, the proposed approach of modelling two-class ensemble decision boundaries could in principle be applied to any application area.

Adversarial Detection by Approximation of Ensemble Boundary

TL;DR

. They show that higher-order Walsh coefficients capture boundary curvature changes induced by adversarial patterns and that the detector using

and the difference

achieves separation between clean and adversarial samples across multiple datasets and architectures; they also release code and details of the architectures. The work offers a principled, interpretable framework for adversarial detection via boundary modeling, while acknowledging limitations (two-class, image-domain) and the need for evaluating adaptive attacks and multi-class extensions such as ECOC.

Abstract

Paper Structure (12 sections, 6 equations, 7 figures, 5 tables)

This paper contains 12 sections, 6 equations, 7 figures, 5 tables.

Introduction
Ensembles Combined using Walsh Coefficients
Adversarial robustness
Boundary Theory of Adversarial Patterns
Adversarial Attacks/Defences
Experimental evidence
Data-sets and classifiers
Definition of terms
Experimental results for DNN ensemble
Discussion of Results
Code Availability and classifier architecture
Conclusion

Figures (7)

Figure 1: Train error versus Walsh coefficient order for clean and Deepfool(DF) training images
Figure 2: Walsh decision probability versus coefficient order for clean and Deepfool(DF) training images
Figure 3: Walsh decision probability versus coefficient order for Test and five ADVs
Figure 4: Ensemble train and test error of dog/cat versus Walsh coefficient order for train/test set before splitting
Figure 5: Typical TEACC/ADVREJ curve as $W_{1}/W_{2}$ detection threshold is varied
...and 2 more figures

Adversarial Detection by Approximation of Ensemble Boundary

TL;DR

Abstract

Adversarial Detection by Approximation of Ensemble Boundary

Authors

TL;DR

Abstract

Table of Contents

Figures (7)