Post-train Black-box Defense via Bayesian Boundary Correction

He Wang; Yunfeng Diao

Post-train Black-box Defense via Bayesian Boundary Correction

He Wang, Yunfeng Diao

TL;DR

This paper tackles adversarial vulnerability in deep classifiers by introducing Bayesian Boundary Correction (BBC), a post-train black-box defense that requires no re-training of the victim model. BBC builds a joint Bayesian, energy-based model over clean data, adversarial examples, and the classifier, and appends a small posterior-side network behind the pre-trained model to realize Bayesian model averaging in a post-train setting. It uses domain-specific distance functions to capture the adversarial distribution near the data manifold for images via perceptual distance and for skeleton-based HAR via motion dynamics, then performs inference with SGHMC to approximate the posterior over appended model parameters. Across image classification and S-HAR benchmarks, BBC consistently improves robustness against white-box and black-box attacks while preserving benign accuracy and without retraining, demonstrating practical applicability and scalability for real-world deployments.

Abstract

Classifiers based on deep neural networks are susceptible to adversarial attack, where the widely existing vulnerability has invoked the research in defending them from potential threats. Given a vulnerable classifier, existing defense methods are mostly white-box and often require re-training the victim under modified loss functions/training regimes. While the model/data/training specifics of the victim are usually unavailable to the user, re-training is unappealing, if not impossible for reasons such as limited computational resources. To this end, we propose a new post-train black-box defense framework. It can turn any pre-trained classifier into a resilient one with little knowledge of the model specifics. This is achieved by new joint Bayesian treatments on the clean data, the adversarial examples and the classifier, for maximizing their joint probability. It is further equipped with a new post-train strategy which keeps the victim intact, avoiding re-training. We name our framework Bayesian Boundary Correction (BBC). BBC is a general and flexible framework that can easily adapt to different data types. We instantiate BBC for image classification and skeleton-based human activity recognition, for both static and dynamic data. Exhaustive evaluation shows that BBC has superior robustness and can enhance robustness without severely hurting the clean accuracy, compared with existing defense methods.

Post-train Black-box Defense via Bayesian Boundary Correction

TL;DR

Abstract

Paper Structure (29 sections, 17 equations, 3 figures, 12 tables, 1 algorithm)

This paper contains 29 sections, 17 equations, 3 figures, 12 tables, 1 algorithm.

Introduction
Related Work
Methodology
Joint Distribution of Data and Adversaries
Connections to Existing Defense Methods
Bayesian Classifier for Further Robustness
Necessity of a Post-train Bayesian Strategy
Inference on BBC
Instantiating BBC for Different Data and Tasks
Perceptual Distance for BBC in Images Classification
Natural Motion Manifold for BBC in S-HAR
Experiments
Experiments on Image Classification
Experimental Settings
Robustness under White-box Attacks
...and 14 more sections

Figures (3)

Figure 1: The attack success rate vs. attack strength curves against SMART on NTU 60. For each subplot, the abscissa axis is iterations while the ordinate axis is the attack success rate(%). ST means standard training.
Figure 2: Comparisons with TRADES with different perturbation budget ($\epsilon$) on NTU60 with SGN. (a): standard accuracy vs. $\epsilon$; (b): robustness against SMART with 20 to 1000 iterations.
Figure 3: The components of the expected loss gradients of BBC on CIFAR-10 with WRN28-10. $N=0$ is standard training. (a): the values of the expected gradient components(EGC); (b): the percentage of the component magnitude (PCM) above and below 10$^{-10}$.

Post-train Black-box Defense via Bayesian Boundary Correction

TL;DR

Abstract

Post-train Black-box Defense via Bayesian Boundary Correction

Authors

TL;DR

Abstract

Table of Contents

Figures (3)