An Alternative Surrogate Loss for PGD-based Adversarial Testing
Sven Gowal, Jonathan Uesato, Chongli Qin, Po-Sen Huang, Timothy Mann, Pushmeet Kohli
TL;DR
Projected Gradient Descent (PGD) attacks are a standard tool for probing neural network robustness, but fixed hyperparameters and single-loss surrogates can miss hard-to-find adversaries. The authors introduce MultiTargeted, a PGD variant that optimizes multiple per-class logit-difference losses across all nontrue classes, improving the likelihood of discovering worst-case perturbations under a fixed compute budget. They prove conditions under which MultiTargeted is guaranteed to find optimal perturbations (e.g., convex propagation or near-linear local behavior) and demonstrate strong empirical performance across MNIST, CIFAR-10, and ImageNet, achieving state-of-the-art attack strength on several leaderboards. The work also provides practical guidance on tuning PGD hyperparameters and shows that combining regular PGD and MultiTargeted can yield robust, scalable adversarial testing across datasets.
Abstract
Adversarial testing methods based on Projected Gradient Descent (PGD) are widely used for searching norm-bounded perturbations that cause the inputs of neural networks to be misclassified. This paper takes a deeper look at these methods and explains the effect of different hyperparameters (i.e., optimizer, step size and surrogate loss). We introduce the concept of MultiTargeted testing, which makes clever use of alternative surrogate losses, and explain when and how MultiTargeted is guaranteed to find optimal perturbations. Finally, we demonstrate that MultiTargeted outperforms more sophisticated methods and often requires less iterative steps than other variants of PGD found in the literature. Notably, MultiTargeted ranks first on MadryLab's white-box MNIST and CIFAR-10 leaderboards, reducing the accuracy of their MNIST model to 88.36% (with $\ell_\infty$ perturbations of $ε= 0.3$) and the accuracy of their CIFAR-10 model to 44.03% (at $ε= 8/255$). MultiTargeted also ranks first on the TRADES leaderboard reducing the accuracy of their CIFAR-10 model to 53.07% (with $\ell_\infty$ perturbations of $ε= 0.031$).
