Adversarial Examples Detection with Bayesian Neural Network
Yao Li, Tongyi Tang, Cho-Jui Hsieh, Thomas C. M. Lee
TL;DR
This work tackles adversarial example detection by introducing BATer, a detector that exploits the stochasticity of Bayesian Neural Networks to generate distributional representations of hidden-layer outputs. By measuring dispersion between a test input’s hidden-layer distributions and class-conditional references across multiple layers using $W_1$ distance and combining these scores with a logistic classifier, BATer achieves strong detection performance while maintaining practicality through layer selection and PCA-based dimensionality reduction. The method demonstrates superior or competitive results against several state-of-the-art detectors on MNIST, CIFAR-10, and Imagenet-Sub, and shows robustness to transfer and adaptive attacks. The approach highlights the value of incorporating randomness in detectors to reveal distributional differences between natural and adversarial data, potentially guiding future trustworthy ML defenses.
Abstract
In this paper, we propose a new framework to detect adversarial examples motivated by the observations that random components can improve the smoothness of predictors and make it easier to simulate the output distribution of a deep neural network. With these observations, we propose a novel Bayesian adversarial example detector, short for BATer, to improve the performance of adversarial example detection. Specifically, we study the distributional difference of hidden layer output between natural and adversarial examples, and propose to use the randomness of the Bayesian neural network to simulate hidden layer output distribution and leverage the distribution dispersion to detect adversarial examples. The advantage of a Bayesian neural network is that the output is stochastic while a deep neural network without random components does not have such characteristics. Empirical results on several benchmark datasets against popular attacks show that the proposed BATer outperforms the state-of-the-art detectors in adversarial example detection.
