Table of Contents
Fetching ...

MMA Training: Direct Input Space Margin Maximization through Adversarial Training

Gavin Weiguang Ding, Yash Sharma, Kry Yik Chau Lui, Ruitong Huang

TL;DR

The paper tackles the limitation of fixed-perturbation adversarial training by reframing robustness as maximizing per-example input-space margins $d_\theta(x,y)$. It introduces Max-Margin Adversarial (MMA) training, which directly maximizes margins up to a per-example threshold $d_{\max}$ using a cross-entropy surrogate and an approximate shortest perturbation $\delta^*$. The authors derive gradient relationships for margin maximization under smooth and non-smooth settings, propose AN-PGD to obtain $\delta^*$, and augment training with a clean loss to stabilize optimization. Empirical results on MNIST and CIFAR-10 across $\ell_\infty$ and $\ell_2$ show MMA improves robustness with reduced hyperparameter sensitivity, performing competitively with ensembles and TRADES. The work provides both theoretical insight and practical algorithms for margin-based defenses in adversarial robustness.

Abstract

We study adversarial robustness of neural networks from a margin maximization perspective, where margins are defined as the distances from inputs to a classifier's decision boundary. Our study shows that maximizing margins can be achieved by minimizing the adversarial loss on the decision boundary at the "shortest successful perturbation", demonstrating a close connection between adversarial losses and the margins. We propose Max-Margin Adversarial (MMA) training to directly maximize the margins to achieve adversarial robustness. Instead of adversarial training with a fixed $ε$, MMA offers an improvement by enabling adaptive selection of the "correct" $ε$ as the margin individually for each datapoint. In addition, we rigorously analyze adversarial training with the perspective of margin maximization, and provide an alternative interpretation for adversarial training, maximizing either a lower or an upper bound of the margins. Our experiments empirically confirm our theory and demonstrate MMA training's efficacy on the MNIST and CIFAR10 datasets w.r.t. $\ell_\infty$ and $\ell_2$ robustness. Code and models are available at https://github.com/BorealisAI/mma_training.

MMA Training: Direct Input Space Margin Maximization through Adversarial Training

TL;DR

The paper tackles the limitation of fixed-perturbation adversarial training by reframing robustness as maximizing per-example input-space margins . It introduces Max-Margin Adversarial (MMA) training, which directly maximizes margins up to a per-example threshold using a cross-entropy surrogate and an approximate shortest perturbation . The authors derive gradient relationships for margin maximization under smooth and non-smooth settings, propose AN-PGD to obtain , and augment training with a clean loss to stabilize optimization. Empirical results on MNIST and CIFAR-10 across and show MMA improves robustness with reduced hyperparameter sensitivity, performing competitively with ensembles and TRADES. The work provides both theoretical insight and practical algorithms for margin-based defenses in adversarial robustness.

Abstract

We study adversarial robustness of neural networks from a margin maximization perspective, where margins are defined as the distances from inputs to a classifier's decision boundary. Our study shows that maximizing margins can be achieved by minimizing the adversarial loss on the decision boundary at the "shortest successful perturbation", demonstrating a close connection between adversarial losses and the margins. We propose Max-Margin Adversarial (MMA) training to directly maximize the margins to achieve adversarial robustness. Instead of adversarial training with a fixed , MMA offers an improvement by enabling adaptive selection of the "correct" as the margin individually for each datapoint. In addition, we rigorously analyze adversarial training with the perspective of margin maximization, and provide an alternative interpretation for adversarial training, maximizing either a lower or an upper bound of the margins. Our experiments empirically confirm our theory and demonstrate MMA training's efficacy on the MNIST and CIFAR10 datasets w.r.t. and robustness. Code and models are available at https://github.com/BorealisAI/mma_training.

Paper Structure

This paper contains 30 sections, 10 theorems, 35 equations, 4 figures, 15 tables, 2 algorithms.

Key Result

Theorem 2.1

Gradient descent on $L^{\text{LM}}_\theta(x + \delta^*, y)$ w.r.t. $\theta$ with a proper step size increases $d_\theta(x, y)$, where $\delta^* = \mathop{\mathrm{arg\,min}}\limits_{L^{\text{LM}}_\theta(x+\delta, y) \geq 0} \|\delta\|$ is the shortest successful perturbation given the current $\theta

Figures (4)

  • Figure 1: Illustration of decision boundary, margin, and shortest successful perturbation on application of an adversarial perturbation.
  • Figure 2: A 1-D example on how margin is affected by decreasing the loss at different locations.
  • Figure 3: Visualization of loss landscape in the input space for MMA and PGD trained models.
  • Figure 4: Margin distributions during training, under the CIFAR10-$\ell_2$ case. Each blue histogram represents the margin value distribution of MMA-3.0, and the orange represents PGD-2.5.

Theorems & Definitions (21)

  • Theorem 2.1
  • Proposition 2.1
  • Remark 2.1
  • Proposition 2.2
  • Proposition 2.3
  • Remark 2.2
  • Proposition 2.4
  • Remark 2.3
  • Theorem 3.1
  • Remark 3.1
  • ...and 11 more