Recent Advances in Adversarial Training for Adversarial Robustness
Tao Bai, Jinqi Luo, Jun Zhao, Bihan Wen, Qian Wang
TL;DR
The paper surveys recent advances in adversarial training (AT) for defending against adversarial examples, introducing a novel taxonomy and analyzing generalization gaps. It traces the field from the origin of AT to seven methodological families, including adversarial regularization, curriculum-based, ensemble, adaptive, semi/unsupervised, efficient, and other variants. It highlights persistent challenges such as non-convex min-max optimization, overfitting, and poor generalization to unseen attacks, and discusses potential future directions beyond AT. The work also recaps benchmarks and provides insights into robustness-accuracy trade-offs and practical implications for deploying robust models.
Abstract
Adversarial training is one of the most effective approaches defending against adversarial examples for deep learning models. Unlike other defense strategies, adversarial training aims to promote the robustness of models intrinsically. During the last few years, adversarial training has been studied and discussed from various aspects. A variety of improvements and developments of adversarial training are proposed, which were, however, neglected in existing surveys. For the first time in this survey, we systematically review the recent progress on adversarial training for adversarial robustness with a novel taxonomy. Then we discuss the generalization problems in adversarial training from three perspectives. Finally, we highlight the challenges which are not fully tackled and present potential future directions.
