Stability and Generalization in Free Adversarial Training
Xiwei Cheng, Kexin Fu, Farzan Farnia
TL;DR
The paper analyzes how min-max optimization strategies in adversarial training affect generalization, comparing vanilla AT with Free AT through the lens of algorithmic stability. It provides nonconvex-nonconcave bounds showing Free AT can achieve a smaller generalization gap ${\mathcal{E}_\text{gen}}$ than vanilla AT by updating weights and perturbations simultaneously, and validates this with extensive experiments across multiple datasets and attacks. A variant, Free--TRADES, is proposed and shown to further reduce the generalization gap while preserving robustness. The results highlight stability-driven insights into robust generalization and suggest practical benefits of simultaneous min-max updates for adversarial learning. Overall, the work links algorithmic stability to improved generalization in adversarial settings and offers a path to more data-efficient, robust models.
Abstract
While adversarial training methods have significantly improved the robustness of deep neural networks against norm-bounded adversarial perturbations, the generalization gap between their performance on training and test data is considerably greater than that of standard empirical risk minimization. Recent studies have aimed to connect the generalization properties of adversarially trained classifiers to the min-max optimization algorithm used in their training. In this work, we analyze the interconnections between generalization and optimization in adversarial training using the algorithmic stability framework. Specifically, our goal is to compare the generalization gap of neural networks trained using the vanilla adversarial training method, which fully optimizes perturbations at every iteration, with the free adversarial training method, which simultaneously optimizes norm-bounded perturbations and classifier parameters. We prove bounds on the generalization error of these methods, indicating that the free adversarial training method may exhibit a lower generalization gap between training and test samples due to its simultaneous min-max optimization of classifier weights and perturbation variables. We conduct several numerical experiments to evaluate the train-to-test generalization gap in vanilla and free adversarial training methods. Our empirical findings also suggest that the free adversarial training method could lead to a smaller generalization gap over a similar number of training iterations. The paper code is available at https://github.com/Xiwei-Cheng/Stability_FreeAT.
