MMA Training: Direct Input Space Margin Maximization through Adversarial Training
Gavin Weiguang Ding, Yash Sharma, Kry Yik Chau Lui, Ruitong Huang
TL;DR
The paper tackles the limitation of fixed-perturbation adversarial training by reframing robustness as maximizing per-example input-space margins $d_\theta(x,y)$. It introduces Max-Margin Adversarial (MMA) training, which directly maximizes margins up to a per-example threshold $d_{\max}$ using a cross-entropy surrogate and an approximate shortest perturbation $\delta^*$. The authors derive gradient relationships for margin maximization under smooth and non-smooth settings, propose AN-PGD to obtain $\delta^*$, and augment training with a clean loss to stabilize optimization. Empirical results on MNIST and CIFAR-10 across $\ell_\infty$ and $\ell_2$ show MMA improves robustness with reduced hyperparameter sensitivity, performing competitively with ensembles and TRADES. The work provides both theoretical insight and practical algorithms for margin-based defenses in adversarial robustness.
Abstract
We study adversarial robustness of neural networks from a margin maximization perspective, where margins are defined as the distances from inputs to a classifier's decision boundary. Our study shows that maximizing margins can be achieved by minimizing the adversarial loss on the decision boundary at the "shortest successful perturbation", demonstrating a close connection between adversarial losses and the margins. We propose Max-Margin Adversarial (MMA) training to directly maximize the margins to achieve adversarial robustness. Instead of adversarial training with a fixed $ε$, MMA offers an improvement by enabling adaptive selection of the "correct" $ε$ as the margin individually for each datapoint. In addition, we rigorously analyze adversarial training with the perspective of margin maximization, and provide an alternative interpretation for adversarial training, maximizing either a lower or an upper bound of the margins. Our experiments empirically confirm our theory and demonstrate MMA training's efficacy on the MNIST and CIFAR10 datasets w.r.t. $\ell_\infty$ and $\ell_2$ robustness. Code and models are available at https://github.com/BorealisAI/mma_training.
