Table of Contents
Fetching ...

Constructing Semantics-Aware Adversarial Examples with a Probabilistic Perspective

Andi Zhang, Mingtian Zhang, Damon Wischik

TL;DR

A probabilistic perspective on adversarial examples is proposed, allowing for subjective understanding of semantics as a distribution into the process of generating adversarial examples, in a principled manner, to embed subjective understanding of semantics as a distribution into the process of generating adversarial examples.

Abstract

We propose a probabilistic perspective on adversarial examples, allowing us to embed subjective understanding of semantics as a distribution into the process of generating adversarial examples, in a principled manner. Despite significant pixel-level modifications compared to traditional adversarial attacks, our method preserves the overall semantics of the image, making the changes difficult for humans to detect. This extensive pixel-level modification enhances our method's ability to deceive classifiers designed to defend against adversarial attacks. Our empirical findings indicate that the proposed methods achieve higher success rates in circumventing adversarial defense mechanisms, while remaining difficult for human observers to detect.

Constructing Semantics-Aware Adversarial Examples with a Probabilistic Perspective

TL;DR

A probabilistic perspective on adversarial examples is proposed, allowing for subjective understanding of semantics as a distribution into the process of generating adversarial examples, in a principled manner, to embed subjective understanding of semantics as a distribution into the process of generating adversarial examples.

Abstract

We propose a probabilistic perspective on adversarial examples, allowing us to embed subjective understanding of semantics as a distribution into the process of generating adversarial examples, in a principled manner. Despite significant pixel-level modifications compared to traditional adversarial attacks, our method preserves the overall semantics of the image, making the changes difficult for humans to detect. This extensive pixel-level modification enhances our method's ability to deceive classifiers designed to defend against adversarial attacks. Our empirical findings indicate that the proposed methods achieve higher success rates in circumventing adversarial defense mechanisms, while remaining difficult for human observers to detect.
Paper Structure (51 sections, 4 theorems, 30 equations, 9 figures, 2 tables, 4 algorithms)

This paper contains 51 sections, 4 theorems, 30 equations, 9 figures, 2 tables, 4 algorithms.

Key Result

Theorem 1

Given the condition that $p_\text{vic}(x_\text{adv}| y_\text{tar}) \propto \exp(-c_2 \, f(x_\text{adv}, y_\text{tar}))$ and $p_\text{dis}(x_\text{adv}|x_\text{ori}) \propto \exp(- c_1\, \mathcal{D}(x_\text{ori}, x_\text{adv}))$, the samples drawn from $p_\text{adv}$ will exhibit the same distributio

Figures (9)

  • Figure 1: Adversarial examples generated by our method. Left: MNIST examples where we injected the subjective semantic understanding that scaling, translation, and distortion preserve digit meaning. The adversarial examples maintain digit interpretability while applying these transformations (see Figure \ref{['fig:compare']} for comparison with other methods). Right: Adversarial example of a hamster image, leveraging semantic knowledge from pre-trained diffusion models. Despite substantial pixel modifications, the image remains natural-looking (see Figure \ref{['fig:imagenetqual']} for comparison with other methods).
  • Figure 2: (a) and (b) display samples drawn from $p_\text{vic}(\cdot | y_\text{tar})$ with the victim classifier being non-adversarially trained and adversarially trained, respectively. (c) showcases samples from $p_\text{dis}(\cdot | x_\text{ori})$ when $\mathcal{D}$ is the square of $\mathcal{L}_2$ norm. (d) illustrates $t(x_\text{ori})$ for $t \sim \mathcal{T}$, where $\mathcal{T}$ represents a distribution of transformations, including TPS (see Appendix \ref{['app:tps']}), scaling, rotation, and cropping. (e) Samples from $p_\text{adv}(\cdot | x_\text{ori}, y_\text{tar})\propto \exp(-c_1\,\mathcal{D}(x_\text{ori}, x_\text{adv}))\exp(-c_2\, f(x_\text{adv}, y_\text{tar}))$, where $\mathcal{D}$ is the $\mathcal{L}_2$ norm, $f$ is the cross-entropy $f_\text{CE}$, $x_\text{ori}$ are the first 36 images from the MNIST test set, $y_\text{tar}$ are set to 1, $c_1$ is $10^{-3}$, and $c_2$ is $10^{-2}$. A green border marks a successful attack, while red denotes failure.
  • Figure 3: Comparative visual analysis of PGD, Prob CW, StAdv, and our proposed method applied to MNIST. The surrogate classifier used is MadryNet with adversarial training. Images are framed with a green border to indicate a successful white-box attack, whereas a red border signifies a failed attack.
  • Figure 4: Comparative visual analysis of NCF, cAdv, ACE, ColorFool and our proposed method applied to Imagenet. The surrogate classifier used is ResNet50. For additional examples, refer to Appendix \ref{['app:qual']}.
  • Figure 5: Average attack success rate across the blackbox transferability and defence methods v.s. human annotation success rate illustrated in Table \ref{['table:imagenet']}.
  • ...and 4 more figures

Theorems & Definitions (6)

  • Theorem 1
  • Theorem 2
  • Theorem 1
  • proof
  • Theorem 2
  • proof