Contract And Conquer: How to Provably Compute Adversarial Examples for a Black-Box Model?

Anna Chistyakova; Mikhail Pautov

Contract And Conquer: How to Provably Compute Adversarial Examples for a Black-Box Model?

Anna Chistyakova, Mikhail Pautov

TL;DR

Contract And Conquer (CAC) is proposed, an approach to provably compute adversarial examples for neural networks in a black-box manner based on knowledge distillation of a black-box model on an expanding distillation dataset and precise contraction of the adversarial example search space.

Abstract

Black-box adversarial attacks are widely used as tools to test the robustness of deep neural networks against malicious perturbations of input data aimed at a specific change in the output of the model. Such methods, although they remain empirically effective, usually do not guarantee that an adversarial example can be found for a particular model. In this paper, we propose Contract And Conquer (CAC), an approach to provably compute adversarial examples for neural networks in a black-box manner. The method is based on knowledge distillation of a black-box model on an expanding distillation dataset and precise contraction of the adversarial example search space. CAC is supported by the transferability guarantee: we prove that the method yields an adversarial example for the black-box model within a fixed number of algorithm iterations. Experimentally, we demonstrate that the proposed approach outperforms existing state-of-the-art black-box attack methods on ImageNet dataset for different target models, including vision transformers.

Contract And Conquer: How to Provably Compute Adversarial Examples for a Black-Box Model?

TL;DR

Abstract

Paper Structure (20 sections, 1 theorem, 22 equations, 2 figures, 9 tables, 1 algorithm)

This paper contains 20 sections, 1 theorem, 22 equations, 2 figures, 9 tables, 1 algorithm.

Introduction
Related Work
Adversarial Attacks
Adversarial Defenses
Methodology
Background and Motivation
Description of CAC
Surrogate Model and White-box Attack
Adjustment of Attack Parameters
Convergence Guarantee
Experiments
Setup of Experiments
Datasets and Target Models
Surrogate Models and White-Box Attack
Baseline Methods
...and 5 more sections

Key Result

Lemma 3.4

Fix an input sample $x$ and initial adversarial attack search space, $U_\delta(x) = \{a: \|x-a\|_\infty \le \delta\}.$ Suppose that for every $j \in \mathbb{Z}_{+},$ the white-box attack in Algorithm alg:adv_whitebox yields an adversarial example for the model $S$. Let $S$ be a differentiable functi where $\|\cdot\|_{op, \infty}$ is the operator norm induced by $l_\infty$ norm of vectors. Let the

Figures (2)

Figure 1: Illustration of the contraction of the adversarial example search space. Given the number $j$ of algorithm iteration, the adversarial example search space on iteration $j$, namely, $U_\delta(x)_j,$ is the intersection of the $\rho_j-$vicinity of an adversarial example $z_j$ with the initial attack search space, $U_{\delta}(x).$ Formally, $U_\delta(x)_j = U_{\delta}(x) \cap U_{\rho_j}(z_j).$ The quantity $\rho_j$ is defined in Eq. \ref{['eq:rho']}. For each algorithm iteration, the adversarial example search space is represented by the intersection of bold circles.
Figure 2: Schematic representation of the proposed method. Given alternation iteration $j$ and the target model $T$, we prepare the distillation dataset $\mathcal{D}(S)$ and train the surrogate model $S_j$. Then, $S_j$ is attacked at the target point $x$ in the white-box setting, and an adversarial example $z_j$ is computed. If $z_j$ is transferable to $T,$ algorithm returns $z_j$ and stops; otherwise, the adversarial example search space is contracted as shown in Fig. \ref{['img:contraction']}, $(z_j, T(z_j))$ is added to the distillation dataset, and the next instance of the surrogate model, $S_{j+1},$ is obtained.

Theorems & Definitions (6)

Definition 3.1
Definition 3.2
Remark 3.3
Lemma 3.4
Remark 3.5
proof

Contract And Conquer: How to Provably Compute Adversarial Examples for a Black-Box Model?

TL;DR

Abstract

Contract And Conquer: How to Provably Compute Adversarial Examples for a Black-Box Model?

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (6)