Adversarial Training Should Be Cast as a Non-Zero-Sum Game

Alexander Robey; Fabian Latorre; George J. Pappas; Hamed Hassani; Volkan Cevher

Adversarial Training Should Be Cast as a Non-Zero-Sum Game

Alexander Robey, Fabian Latorre, George J. Pappas, Hamed Hassani, Volkan Cevher

TL;DR

A novel non-zero-sum bilevel formulation of adversarial training, wherein each player optimizes a different objective function yields a simple algorithmic framework that matches and in some cases outperforms state-of-the-art attacks, attains comparable levels of robustness to standard adversarial training algorithms, and does not suffer from robust overfitting.

Abstract

One prominent approach toward resolving the adversarial vulnerability of deep neural networks is the two-player zero-sum paradigm of adversarial training, in which predictors are trained against adversarially chosen perturbations of data. Despite the promise of this approach, algorithms based on this paradigm have not engendered sufficient levels of robustness and suffer from pathological behavior like robust overfitting. To understand this shortcoming, we first show that the commonly used surrogate-based relaxation used in adversarial training algorithms voids all guarantees on the robustness of trained classifiers. The identification of this pitfall informs a novel non-zero-sum bilevel formulation of adversarial training, wherein each player optimizes a different objective function. Our formulation yields a simple algorithmic framework that matches and in some cases outperforms state-of-the-art attacks, attains comparable levels of robustness to standard adversarial training algorithms, and does not suffer from robust overfitting.

Adversarial Training Should Be Cast as a Non-Zero-Sum Game

TL;DR

Abstract

Paper Structure (19 sections, 1 theorem, 40 equations, 4 figures, 2 tables, 3 algorithms)

This paper contains 19 sections, 1 theorem, 40 equations, 4 figures, 2 tables, 3 algorithms.

Introduction
The promises and pitfalls of adversarial training
Preliminaries: Training DNNs with surrogate losses
The pervasive setting of adversarial examples
Surrogate-based approaches to robustness
Non-zero-sum formulation of adversarial training
Decoupling adversarial attacks and defenses
Putting the pieces together: Non-zero-sum adversarial training
Algorithms
Experiments
Related work
Conclusion
Appendices
Proof of \ref{['prop:reformulation-lower-level']}
Smooth reformulation of the lower level
...and 4 more sections

Key Result

Proposition 1

Given a fixed data pair $(X,Y)$, let $\eta^\star$ denote any maximizer of $M_\theta(X+\eta,Y)_j$ over the classes $j\in[K]-\{Y\}$ and perturbations $\eta\in\mathbb{R}^d$ satisfying $\left|\left| \eta \right|\right|\leq\epsilon$, i.e., Then if $M_\theta(X+\eta^\star,Y)_{j^\star} > 0$, $\eta^\star$ induces a misclassification and satisfies the constraint in eq:bilevel-const-misclassification, meani

Figures (4)

Figure 1: BETA does not suffer from robust overfitting. We plot the learning curves against a PGD$^{20}$ adversary for PGD$^{10}$ and BETA-AT$^{10}$. Observe that although PGD displays robust overfitting after the first learning rate decay step, BETA-AT does not suffer from this pitfall.
Figure 2: Adversarial training performance-speed trade-off. Each point is annotated with the number of steps with which the corresponding algorithm was run. Observe that robust overfitting is eliminated by BETA, but that this comes at the cost of increased computational overhead. This reveals an expected performance-speed trade-off for our algorithm.
Figure 3: Adversarial evaluation timing comparison. The running time for evaluating the top models on RobustBench using AutoAttack and BETA with the same settings as Table 2 are reported. On average, BETA is 5.11 times faster than AutoAttack.
Figure 4: Plot of function to be maximized in \ref{['eq:final_problem_ce']}. We subtract $y=2.5$ for ease of viewing

Theorems & Definitions (2)

Example 1
Proposition 1

Adversarial Training Should Be Cast as a Non-Zero-Sum Game

TL;DR

Abstract

Adversarial Training Should Be Cast as a Non-Zero-Sum Game

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (2)