GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models

Zaitang Li; Pin-Yu Chen; Tsung-Yi Ho

GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models

Zaitang Li, Pin-Yu Chen, Tsung-Yi Ho

TL;DR

This paper makes the first attempt to present a new framework, called GREAT Score, for global robustness evaluation of adversarial perturbation using generative models, and shows high correlation and significantly reduced computation cost of GREAT Score when compared to the attack-based model ranking on RobustBench.

Abstract

Current studies on adversarial robustness mainly focus on aggregating local robustness results from a set of data samples to evaluate and rank different models. However, the local statistics may not well represent the true global robustness of the underlying unknown data distribution. To address this challenge, this paper makes the first attempt to present a new framework, called GREAT Score , for global robustness evaluation of adversarial perturbation using generative models. Formally, GREAT Score carries the physical meaning of a global statistic capturing a mean certified attack-proof perturbation level over all samples drawn from a generative model. For finite-sample evaluation, we also derive a probabilistic guarantee on the sample complexity and the difference between the sample mean and the true mean. GREAT Score has several advantages: (1) Robustness evaluations using GREAT Score are efficient and scalable to large models, by sparing the need of running adversarial attacks. In particular, we show high correlation and significantly reduced computation cost of GREAT Score when compared to the attack-based model ranking on RobustBench (Croce,et. al. 2021). (2) The use of generative models facilitates the approximation of the unknown data distribution. In our ablation study with different generative adversarial networks (GANs), we observe consistency between global robustness evaluation and the quality of GANs. (3) GREAT Score can be used for remote auditing of privacy-sensitive black-box models, as demonstrated by our robustness evaluation on several online facial recognition services.

GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models

TL;DR

Abstract

Paper Structure (33 sections, 7 theorems, 24 equations, 12 figures, 10 tables, 1 algorithm)

This paper contains 33 sections, 7 theorems, 24 equations, 12 figures, 10 tables, 1 algorithm.

Introduction
Background and Related Works
GREAT Score: Methodology and Algorithms
True Global Robustness and Certified Estimate
Using GMs to Evaluate Global Robustness
Probabilistic Guarantee on Sample Mean
Algorithm and Computational Complexity
Calibrated GREAT Score
Experimental Results
Experiment Setup
Local and Global Robustness Analysis
Model Ranking on CIFAR-10 and ImageNet
Ablation Study and Run-time Analysis
Evaluation on Online Facial Recognition APIs
Conclusion
...and 18 more sections

Key Result

Theorem 1

Let $f:[0,1]^d \mapsto [0,1]^K$ be a $K$-way classifier and let $f_k(\cdot)$ be the predicted likelihood of class $k$, with $c$ denoting the groundtruth class. Given a generator $G$ such that it generates a sample $G(z)$ with $z \sim \mathcal{N}(0,I)$. Define $g\left(G(z)\right) = \sqrt{\cfrac{\pi}{

Figures (12)

Figure 1: Comparison of local GREAT Score and CW attack in $\mathcal{L}_2$ perturbation on CIFAR-10 with Rebuffi_extra model rebuffi2021fixing. The x-axis is the image id. The result shows the local GREAT Score is indeed a lower bound of the perturbation level found by CW attack.
Figure 2: Cumulative robust accuracy (RA) with varying $\mathcal{L}_2$ perturbation level using 500 samples. Note that GREAT Score gives a certified RA for attack-proof robustness, whereas Auto-Attack is an empirical robustness evaluation.
Figure 3: Robustness evaluation on ImageNet using GREAT Score, RobustBench (with test set), and Auto Attack (with generated samples). The Spearman's rank correlation coefficient for GREAT Score v.s. RobustBench and Auto-Attack v.s. RobustBench is 0.9 and 0.872, respectively.
Figure 4: Run-time improvement (GREAT Score over Auto-Attack) on 500 generated CIFAR-10 images.
Figure 5: The Flow Chart of GREAT Score.
...and 7 more figures

Theorems & Definitions (8)

Definition 1: True global robustness w.r.t. $P$
Theorem 1: certified global robustness estimate
Theorem 2: probabilistic guarantee on sample mean
Lemma 1: Lipschitz continuity in Gradient Form (paulavivcius2006analysis)
Lemma 2: Formal guarantee on lower bound for untargeted attack of Theorem 3.2 in weng2018evaluating
Lemma 3: Stein's lemma stein1981estimation
Lemma 4: Proof of global Lipschitz constant
Lemma 5: Concentration inequality from Theorem 3.1 in maurer2021some

GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models

TL;DR

Abstract

GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (8)