Table of Contents
Fetching ...

An Alternative Surrogate Loss for PGD-based Adversarial Testing

Sven Gowal, Jonathan Uesato, Chongli Qin, Po-Sen Huang, Timothy Mann, Pushmeet Kohli

TL;DR

Projected Gradient Descent (PGD) attacks are a standard tool for probing neural network robustness, but fixed hyperparameters and single-loss surrogates can miss hard-to-find adversaries. The authors introduce MultiTargeted, a PGD variant that optimizes multiple per-class logit-difference losses across all nontrue classes, improving the likelihood of discovering worst-case perturbations under a fixed compute budget. They prove conditions under which MultiTargeted is guaranteed to find optimal perturbations (e.g., convex propagation or near-linear local behavior) and demonstrate strong empirical performance across MNIST, CIFAR-10, and ImageNet, achieving state-of-the-art attack strength on several leaderboards. The work also provides practical guidance on tuning PGD hyperparameters and shows that combining regular PGD and MultiTargeted can yield robust, scalable adversarial testing across datasets.

Abstract

Adversarial testing methods based on Projected Gradient Descent (PGD) are widely used for searching norm-bounded perturbations that cause the inputs of neural networks to be misclassified. This paper takes a deeper look at these methods and explains the effect of different hyperparameters (i.e., optimizer, step size and surrogate loss). We introduce the concept of MultiTargeted testing, which makes clever use of alternative surrogate losses, and explain when and how MultiTargeted is guaranteed to find optimal perturbations. Finally, we demonstrate that MultiTargeted outperforms more sophisticated methods and often requires less iterative steps than other variants of PGD found in the literature. Notably, MultiTargeted ranks first on MadryLab's white-box MNIST and CIFAR-10 leaderboards, reducing the accuracy of their MNIST model to 88.36% (with $\ell_\infty$ perturbations of $ε= 0.3$) and the accuracy of their CIFAR-10 model to 44.03% (at $ε= 8/255$). MultiTargeted also ranks first on the TRADES leaderboard reducing the accuracy of their CIFAR-10 model to 53.07% (with $\ell_\infty$ perturbations of $ε= 0.031$).

An Alternative Surrogate Loss for PGD-based Adversarial Testing

TL;DR

Projected Gradient Descent (PGD) attacks are a standard tool for probing neural network robustness, but fixed hyperparameters and single-loss surrogates can miss hard-to-find adversaries. The authors introduce MultiTargeted, a PGD variant that optimizes multiple per-class logit-difference losses across all nontrue classes, improving the likelihood of discovering worst-case perturbations under a fixed compute budget. They prove conditions under which MultiTargeted is guaranteed to find optimal perturbations (e.g., convex propagation or near-linear local behavior) and demonstrate strong empirical performance across MNIST, CIFAR-10, and ImageNet, achieving state-of-the-art attack strength on several leaderboards. The work also provides practical guidance on tuning PGD hyperparameters and shows that combining regular PGD and MultiTargeted can yield robust, scalable adversarial testing across datasets.

Abstract

Adversarial testing methods based on Projected Gradient Descent (PGD) are widely used for searching norm-bounded perturbations that cause the inputs of neural networks to be misclassified. This paper takes a deeper look at these methods and explains the effect of different hyperparameters (i.e., optimizer, step size and surrogate loss). We introduce the concept of MultiTargeted testing, which makes clever use of alternative surrogate losses, and explain when and how MultiTargeted is guaranteed to find optimal perturbations. Finally, we demonstrate that MultiTargeted outperforms more sophisticated methods and often requires less iterative steps than other variants of PGD found in the literature. Notably, MultiTargeted ranks first on MadryLab's white-box MNIST and CIFAR-10 leaderboards, reducing the accuracy of their MNIST model to 88.36% (with perturbations of ) and the accuracy of their CIFAR-10 model to 44.03% (at ). MultiTargeted also ranks first on the TRADES leaderboard reducing the accuracy of their CIFAR-10 model to 53.07% (with perturbations of ).

Paper Structure

This paper contains 22 sections, 2 theorems, 7 equations, 8 figures, 2 tables, 3 algorithms.

Key Result

Theorem 3.1

Given a globally linear model $f_\theta$ with $C$ output logits, for any input $x$, MultiTargeted is stronger than regular PGD attacks that use the margin loss (or cross-entropy loss) when the number of restart is greater or equal to $C - 1$ (i.e., $N_\textrm{r} \geq C - 1$) and the adversarial inpu

Figures (8)

  • Figure 1: Panel \ref{['fig:motivating_example']} shows an example motivating why MultiTargeted can outperform other untargeted attacks. In this particular example a regular PGD-based attack with 2 restarts achieves 75% success rate (as opposed to 100% for MultiTargeted). Panel \ref{['fig:motivating_example2']} shows a more extreme example. Here, a regular PGD-based attack with 2 restarts achieves 25% success rate only. For both panels, the areas shaded in gray are outside the adversarial input set.
  • Figure 2: Success rate of a regular PGD-based attack with $C-1$ restarts against a random linear classifier depending on the number of confusing classes. For all possible numbers of confusing classes, MultiTargeted has a 100% success rate.
  • Figure 3: Panel \ref{['fig:cifar_combined2']} shows the accuracy under attacks of size $\epsilon = 8/255$ for four different Cifar-10 models. The solid lines are the accuracies obtained by MultiTargeted for different values of $T$, while the dashed lines of the corresponding color are accuracies obtained by regular PGD (for the same computational budget). Panel \ref{['fig:cifar_restarts']} shows the accuracy under a MultiTargeted attack of size $\epsilon = 8/255$ for madry_towards_2017's model as a function of the number of PGD steps $K$ and the number of restarts $N_\textrm{i}$.
  • Figure 4: Accuracy under attacks of size $\epsilon = 16 / 255$ for three different ImageNet models. The solid lines are the accuracies obtained by MultiTargeted for different values of $T$, while the dashed lines of the corresponding color are accuracies obtained by regular PGD (for the same computational budget).
  • Figure 5: Examples with non-convex adversarial input sets. For all panels, the adversarial input set is highlighted in blue and the region in purple is where misclassification occurs. The solid line in black defines the boundary where the input is classified as class 2. The dashed line is where the two plane $z_1 - z_0$ and $z_2 - z_0$ intersect. When $\xi^{(0)}$ is sampled within the region highlighted in red, the attack for which that panel corresponds to is successful. At the top, we have an example for which regular PGD is more successful. At the bottom, we have an example for which MultiTargeted is more successful.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Theorem 3.1
  • Theorem 3.2