Table of Contents
Fetching ...

Contract And Conquer: How to Provably Compute Adversarial Examples for a Black-Box Model?

Anna Chistyakova, Mikhail Pautov

TL;DR

Contract And Conquer (CAC) is proposed, an approach to provably compute adversarial examples for neural networks in a black-box manner based on knowledge distillation of a black-box model on an expanding distillation dataset and precise contraction of the adversarial example search space.

Abstract

Black-box adversarial attacks are widely used as tools to test the robustness of deep neural networks against malicious perturbations of input data aimed at a specific change in the output of the model. Such methods, although they remain empirically effective, usually do not guarantee that an adversarial example can be found for a particular model. In this paper, we propose Contract And Conquer (CAC), an approach to provably compute adversarial examples for neural networks in a black-box manner. The method is based on knowledge distillation of a black-box model on an expanding distillation dataset and precise contraction of the adversarial example search space. CAC is supported by the transferability guarantee: we prove that the method yields an adversarial example for the black-box model within a fixed number of algorithm iterations. Experimentally, we demonstrate that the proposed approach outperforms existing state-of-the-art black-box attack methods on ImageNet dataset for different target models, including vision transformers.

Contract And Conquer: How to Provably Compute Adversarial Examples for a Black-Box Model?

TL;DR

Contract And Conquer (CAC) is proposed, an approach to provably compute adversarial examples for neural networks in a black-box manner based on knowledge distillation of a black-box model on an expanding distillation dataset and precise contraction of the adversarial example search space.

Abstract

Black-box adversarial attacks are widely used as tools to test the robustness of deep neural networks against malicious perturbations of input data aimed at a specific change in the output of the model. Such methods, although they remain empirically effective, usually do not guarantee that an adversarial example can be found for a particular model. In this paper, we propose Contract And Conquer (CAC), an approach to provably compute adversarial examples for neural networks in a black-box manner. The method is based on knowledge distillation of a black-box model on an expanding distillation dataset and precise contraction of the adversarial example search space. CAC is supported by the transferability guarantee: we prove that the method yields an adversarial example for the black-box model within a fixed number of algorithm iterations. Experimentally, we demonstrate that the proposed approach outperforms existing state-of-the-art black-box attack methods on ImageNet dataset for different target models, including vision transformers.
Paper Structure (20 sections, 1 theorem, 22 equations, 2 figures, 9 tables, 1 algorithm)

This paper contains 20 sections, 1 theorem, 22 equations, 2 figures, 9 tables, 1 algorithm.

Key Result

Lemma 3.4

Fix an input sample $x$ and initial adversarial attack search space, $U_\delta(x) = \{a: \|x-a\|_\infty \le \delta\}.$ Suppose that for every $j \in \mathbb{Z}_{+},$ the white-box attack in Algorithm alg:adv_whitebox yields an adversarial example for the model $S$. Let $S$ be a differentiable functi where $\|\cdot\|_{op, \infty}$ is the operator norm induced by $l_\infty$ norm of vectors. Let the

Figures (2)

  • Figure 1: Illustration of the contraction of the adversarial example search space. Given the number $j$ of algorithm iteration, the adversarial example search space on iteration $j$, namely, $U_\delta(x)_j,$ is the intersection of the $\rho_j-$vicinity of an adversarial example $z_j$ with the initial attack search space, $U_{\delta}(x).$ Formally, $U_\delta(x)_j = U_{\delta}(x) \cap U_{\rho_j}(z_j).$ The quantity $\rho_j$ is defined in Eq. \ref{['eq:rho']}. For each algorithm iteration, the adversarial example search space is represented by the intersection of bold circles.
  • Figure 2: Schematic representation of the proposed method. Given alternation iteration $j$ and the target model $T$, we prepare the distillation dataset $\mathcal{D}(S)$ and train the surrogate model $S_j$. Then, $S_j$ is attacked at the target point $x$ in the white-box setting, and an adversarial example $z_j$ is computed. If $z_j$ is transferable to $T,$ algorithm returns $z_j$ and stops; otherwise, the adversarial example search space is contracted as shown in Fig. \ref{['img:contraction']}, $(z_j, T(z_j))$ is added to the distillation dataset, and the next instance of the surrogate model, $S_{j+1},$ is obtained.

Theorems & Definitions (6)

  • Definition 3.1
  • Definition 3.2
  • Remark 3.3
  • Lemma 3.4
  • Remark 3.5
  • proof