Table of Contents
Fetching ...

Adversarial Robustness Against the Union of Multiple Perturbation Models

Pratyush Maini, Eric Wong, J. Zico Kolter

TL;DR

The paper tackles the lack of robust performance when defending against multiple adversarial perturbation models by extending PGD-based adversarial training. It introduces Multi Steepest Descent (MSD), which unifies gradient directions across l_infty, l_2, and l_1 perturbations to minimize the worst-case loss over the union. Empirical results on MNIST and CIFAR-10 show MSD achieves superior union robustness compared to simple Max/Avg baselines and avoids gradient-masking pitfalls seen in some prior approaches. The work demonstrates that directly optimizing for the union of perturbation models yields more reliable adversarial robustness and provides a scalable approach applicable to common architectures.

Abstract

Owing to the susceptibility of deep learning systems to adversarial attacks, there has been a great deal of work in developing (both empirically and certifiably) robust classifiers. While most work has defended against a single type of attack, recent work has looked at defending against multiple perturbation models using simple aggregations of multiple attacks. However, these methods can be difficult to tune, and can easily result in imbalanced degrees of robustness to individual perturbation models, resulting in a sub-optimal worst-case loss over the union. In this work, we develop a natural generalization of the standard PGD-based procedure to incorporate multiple perturbation models into a single attack, by taking the worst-case over all steepest descent directions. This approach has the advantage of directly converging upon a trade-off between different perturbation models which minimizes the worst-case performance over the union. With this approach, we are able to train standard architectures which are simultaneously robust against $\ell_\infty$, $\ell_2$, and $\ell_1$ attacks, outperforming past approaches on the MNIST and CIFAR10 datasets and achieving adversarial accuracy of 47.0% against the union of ($\ell_\infty$, $\ell_2$, $\ell_1$) perturbations with radius = (0.03, 0.5, 12) on the latter, improving upon previous approaches which achieve 40.6% accuracy.

Adversarial Robustness Against the Union of Multiple Perturbation Models

TL;DR

The paper tackles the lack of robust performance when defending against multiple adversarial perturbation models by extending PGD-based adversarial training. It introduces Multi Steepest Descent (MSD), which unifies gradient directions across l_infty, l_2, and l_1 perturbations to minimize the worst-case loss over the union. Empirical results on MNIST and CIFAR-10 show MSD achieves superior union robustness compared to simple Max/Avg baselines and avoids gradient-masking pitfalls seen in some prior approaches. The work demonstrates that directly optimizing for the union of perturbation models yields more reliable adversarial robustness and provides a scalable approach applicable to common architectures.

Abstract

Owing to the susceptibility of deep learning systems to adversarial attacks, there has been a great deal of work in developing (both empirically and certifiably) robust classifiers. While most work has defended against a single type of attack, recent work has looked at defending against multiple perturbation models using simple aggregations of multiple attacks. However, these methods can be difficult to tune, and can easily result in imbalanced degrees of robustness to individual perturbation models, resulting in a sub-optimal worst-case loss over the union. In this work, we develop a natural generalization of the standard PGD-based procedure to incorporate multiple perturbation models into a single attack, by taking the worst-case over all steepest descent directions. This approach has the advantage of directly converging upon a trade-off between different perturbation models which minimizes the worst-case performance over the union. With this approach, we are able to train standard architectures which are simultaneously robust against , , and attacks, outperforming past approaches on the MNIST and CIFAR10 datasets and achieving adversarial accuracy of 47.0% against the union of (, , ) perturbations with radius = (0.03, 0.5, 12) on the latter, improving upon previous approaches which achieve 40.6% accuracy.

Paper Structure

This paper contains 47 sections, 21 equations, 10 figures, 7 tables, 2 algorithms.

Figures (10)

  • Figure 1: A depiction of the steepest descent directions for $\ell_\infty$, $\ell_2$, and $\ell_1$ norms. The gradient is the black arrow, and the $\alpha$ radius step sizes and their corresponding steepest descent directions $\ell_\infty$, $\ell_2$, and $\ell_1$ are shown in blue, red, and green respectively.
  • Figure 2: Robustness curves showing the adversarial accuracy for the MNIST model trained with MSD, Avg, Max against $\ell_\infty$ (left), $\ell_2$ (middle), and $\ell_1$ (right) perturbation models over a range of epsilon.
  • Figure 3: A view of each of the (5x5) learned filters of the first layer of a CNN robust to $\ell_\infty$ attacks. The singular sharp values are characteristic features of models robust to $\ell_\infty$ attacks.
  • Figure 4: Among all the models trained using the MSD, Max and Avg methods during our hyperparameter search, we plot the percentage of models for each method that achieve robust accuracies greater than a particular threshold (against the union of $\ell_\infty, \ell_1, \ell_2$ attacks).
  • Figure 5: Robustness curves showing the adversarial accuracy for the CIFAR10 model trained with MSD, Avg, Max against $\ell_\infty$ (left), $\ell_2$ (middle), and $\ell_1$ (right) perturbation models over a range of epsilon.
  • ...and 5 more figures