Certified Zeroth-order Black-Box Defense with Robust UNet Denoiser

Astha Verma; A V Subramanyam; Siddhesh Bangar; Naman Lal; Rajiv Ratn Shah; Shin'ichi Satoh

Certified Zeroth-order Black-Box Defense with Robust UNet Denoiser

Astha Verma, A V Subramanyam, Siddhesh Bangar, Naman Lal, Rajiv Ratn Shah, Shin'ichi Satoh

TL;DR

The paper addresses certified robustness for black-box models under zeroth-order access. It introduces a robust UNet-based denoiser (RDUNet) and two defense pipelines (ZO-RUDS and ZO-AE-RUDS) that prepend denoising before the target model, enabling certification within an $L_2$-radius $R_c$. The approach leverages a multi-term objective including cross-entropy, cosine-feature similarity, and MMD losses, and exploits both randomized gradient estimates and coordinate-wise estimates in reduced latent spaces to mitigate ZO variance. Empirical results across CIFAR-10, CIFAR-100, STL-10, Tiny ImageNet, and MNIST reconstruction demonstrate significant improvements over state-of-the-art methods in both classification and reconstruction tasks, highlighting practical applicability to high-dimensional data without accessing model parameters.

Abstract

Certified defense methods against adversarial perturbations have been recently investigated in the black-box setting with a zeroth-order (ZO) perspective. However, these methods suffer from high model variance with low performance on high-dimensional datasets due to the ineffective design of the denoiser and are limited in their utilization of ZO techniques. To this end, we propose a certified ZO preprocessing technique for removing adversarial perturbations from the attacked image in the black-box setting using only model queries. We propose a robust UNet denoiser (RDUNet) that ensures the robustness of black-box models trained on high-dimensional datasets. We propose a novel black-box denoised smoothing (DS) defense mechanism, ZO-RUDS, by prepending our RDUNet to the black-box model, ensuring black-box defense. We further propose ZO-AE-RUDS in which RDUNet followed by autoencoder (AE) is prepended to the black-box model. We perform extensive experiments on four classification datasets, CIFAR-10, CIFAR-10, Tiny Imagenet, STL-10, and the MNIST dataset for image reconstruction tasks. Our proposed defense methods ZO-RUDS and ZO-AE-RUDS beat SOTA with a huge margin of $35\%$ and $9\%$, for low dimensional (CIFAR-10) and with a margin of $20.61\%$ and $23.51\%$ for high-dimensional (STL-10) datasets, respectively.

Certified Zeroth-order Black-Box Defense with Robust UNet Denoiser

TL;DR

-radius

. The approach leverages a multi-term objective including cross-entropy, cosine-feature similarity, and MMD losses, and exploits both randomized gradient estimates and coordinate-wise estimates in reduced latent spaces to mitigate ZO variance. Empirical results across CIFAR-10, CIFAR-100, STL-10, Tiny ImageNet, and MNIST reconstruction demonstrate significant improvements over state-of-the-art methods in both classification and reconstruction tasks, highlighting practical applicability to high-dimensional data without accessing model parameters.

Abstract

and

, for low dimensional (CIFAR-10) and with a margin of

and

for high-dimensional (STL-10) datasets, respectively.

Paper Structure (17 sections, 18 equations, 6 figures, 10 tables, 2 algorithms)

This paper contains 17 sections, 18 equations, 6 figures, 10 tables, 2 algorithms.

Introduction
Related Work
Empirical Defense
Certified Defense
ZO Optimization for adversarial learning.
Preliminaries
Methodology
Proposed Robust Architecture
Proposed Objective Function
Proposed Black-Box Defense Methods
Experimental Settings
Comparison with SOTA
Performance on Image Classification
Performance on Image Reconstruction
Ablation Study
...and 2 more sections

Figures (6)

Figure 1: We make a comparison with four previous certified defense methods, including RS cohen2019certified, DS (W) salman2020denoised, DS (B) salman2020denoised, ZO-DS zhang2022robustify and ZO-AE-DS zhang2022robustify (ZO-optimization approaches). 'W' and 'B' refer to white-box (defense technique can utilize weights of target model $f$) and black-box settings. '$x$' - input sample, '$\eta$' - noise, 'E' - Encoder, 'D' - Decoder, '$f$' - target model, $f_{s}$ - surrogate model (proxy of $f$), and '$z$' - latent feature vector.
Figure 2: An overview of proposed certified defense mechanism via robust UNet denoiser RDUNet. Noise $\eta$ is added to input sample $x$ which is given as input to the robust denoiser. The output of denoiser is the residual map which when added to noisy image $x^{\ast}$ gives denoised output $\hat{x}$. The denoised output is input of the autoencoder architecture which is then send as input to black-box model $f$.
Figure 3: Architecture of our defense technique ZO-AE-RUDS. The decoder 'D' and the target model 'f'constitute the Black-Box architecture, and therefore, their parameters are not learnt.
Figure 4: Comparison of Certified Accuracy on low-dimension (CIFAR-10, CIFAR-100) and high-dimension (STL-10 and Tiny Imagenet) datasets for different $l_{2}$ radius at query $'q'=192$. 'R'- RGE and 'C'- CGE ZO techniques.
Figure 5: Effect of different denoisers on our RGE and CGE ZO optimization-based defense approaches for different $l_{2}$-radii at $'q'=192$. Dataset is CIFAR-10.
...and 1 more figures

Certified Zeroth-order Black-Box Defense with Robust UNet Denoiser

TL;DR

Abstract

Certified Zeroth-order Black-Box Defense with Robust UNet Denoiser

Authors

TL;DR

Abstract

Table of Contents

Figures (6)