Table of Contents
Fetching ...

Adversarial Training Should Be Cast as a Non-Zero-Sum Game

Alexander Robey, Fabian Latorre, George J. Pappas, Hamed Hassani, Volkan Cevher

TL;DR

A novel non-zero-sum bilevel formulation of adversarial training, wherein each player optimizes a different objective function yields a simple algorithmic framework that matches and in some cases outperforms state-of-the-art attacks, attains comparable levels of robustness to standard adversarial training algorithms, and does not suffer from robust overfitting.

Abstract

One prominent approach toward resolving the adversarial vulnerability of deep neural networks is the two-player zero-sum paradigm of adversarial training, in which predictors are trained against adversarially chosen perturbations of data. Despite the promise of this approach, algorithms based on this paradigm have not engendered sufficient levels of robustness and suffer from pathological behavior like robust overfitting. To understand this shortcoming, we first show that the commonly used surrogate-based relaxation used in adversarial training algorithms voids all guarantees on the robustness of trained classifiers. The identification of this pitfall informs a novel non-zero-sum bilevel formulation of adversarial training, wherein each player optimizes a different objective function. Our formulation yields a simple algorithmic framework that matches and in some cases outperforms state-of-the-art attacks, attains comparable levels of robustness to standard adversarial training algorithms, and does not suffer from robust overfitting.

Adversarial Training Should Be Cast as a Non-Zero-Sum Game

TL;DR

A novel non-zero-sum bilevel formulation of adversarial training, wherein each player optimizes a different objective function yields a simple algorithmic framework that matches and in some cases outperforms state-of-the-art attacks, attains comparable levels of robustness to standard adversarial training algorithms, and does not suffer from robust overfitting.

Abstract

One prominent approach toward resolving the adversarial vulnerability of deep neural networks is the two-player zero-sum paradigm of adversarial training, in which predictors are trained against adversarially chosen perturbations of data. Despite the promise of this approach, algorithms based on this paradigm have not engendered sufficient levels of robustness and suffer from pathological behavior like robust overfitting. To understand this shortcoming, we first show that the commonly used surrogate-based relaxation used in adversarial training algorithms voids all guarantees on the robustness of trained classifiers. The identification of this pitfall informs a novel non-zero-sum bilevel formulation of adversarial training, wherein each player optimizes a different objective function. Our formulation yields a simple algorithmic framework that matches and in some cases outperforms state-of-the-art attacks, attains comparable levels of robustness to standard adversarial training algorithms, and does not suffer from robust overfitting.
Paper Structure (19 sections, 1 theorem, 40 equations, 4 figures, 2 tables, 3 algorithms)

This paper contains 19 sections, 1 theorem, 40 equations, 4 figures, 2 tables, 3 algorithms.

Key Result

Proposition 1

Given a fixed data pair $(X,Y)$, let $\eta^\star$ denote any maximizer of $M_\theta(X+\eta,Y)_j$ over the classes $j\in[K]-\{Y\}$ and perturbations $\eta\in\mathbb{R}^d$ satisfying $\left|\left| \eta \right|\right|\leq\epsilon$, i.e., Then if $M_\theta(X+\eta^\star,Y)_{j^\star} > 0$, $\eta^\star$ induces a misclassification and satisfies the constraint in eq:bilevel-const-misclassification, meani

Figures (4)

  • Figure 1: BETA does not suffer from robust overfitting. We plot the learning curves against a PGD$^{20}$ adversary for PGD$^{10}$ and BETA-AT$^{10}$. Observe that although PGD displays robust overfitting after the first learning rate decay step, BETA-AT does not suffer from this pitfall.
  • Figure 2: Adversarial training performance-speed trade-off. Each point is annotated with the number of steps with which the corresponding algorithm was run. Observe that robust overfitting is eliminated by BETA, but that this comes at the cost of increased computational overhead. This reveals an expected performance-speed trade-off for our algorithm.
  • Figure 3: Adversarial evaluation timing comparison. The running time for evaluating the top models on RobustBench using AutoAttack and BETA with the same settings as Table 2 are reported. On average, BETA is 5.11 times faster than AutoAttack.
  • Figure 4: Plot of function to be maximized in \ref{['eq:final_problem_ce']}. We subtract $y=2.5$ for ease of viewing

Theorems & Definitions (2)

  • Example 1
  • Proposition 1