Asymptotic Behavior of Adversarial Training Estimator under $\ell_\infty$-Perturbation
Yiling Xie, Xiaoming Huo
TL;DR
This work analyzes adversarial training under $\ell_\infty$ perturbations within generalized linear models, revealing that the asymptotic distribution of the estimator can place mass at zero when $\beta^*=0$ for the critical perturbation order $\delta_n=\eta/\sqrt{n}$. By decomposing the regularization effect of the inner max, it shows a gradient-based regularization coupled with an $\ell_1$ penalty, explaining the observed sparsity phenomena. The authors propose adaptive adversarial training, a two-step procedure that uses the ERM estimator to weight perturbations, achieving asymptotic variable-selection consistency and, for $1/2<\gamma<1$, asymptotic unbiasedness. Through rigorous theory and extensive simulations and real-data experiments, the paper demonstrates superior sparsity-recovery and estimation accuracy for adaptive adversarial training compared with classic adversarial training, with practical implications for robust and parsimonious modeling. The findings bridge distributionally robust optimization, LASSO-type regularization, and adversarial robustness, offering a principled approach to sparse, robust inference under worst-case perturbations.
Abstract
Adversarial training has been proposed to protect machine learning models against adversarial attacks. This paper focuses on adversarial training under $\ell_\infty$-perturbation, which has recently attracted much research attention. The asymptotic behavior of the adversarial training estimator is investigated in the generalized linear model. The results imply that the asymptotic distribution of the adversarial training estimator under $\ell_\infty$-perturbation could put a positive probability mass at $0$ when the true parameter is $0$, providing a theoretical guarantee of the associated sparsity-recovery ability. Alternatively, a two-step procedure is proposed -- adaptive adversarial training, which could further improve the performance of adversarial training under $\ell_\infty$-perturbation. Specifically, the proposed procedure could achieve asymptotic variable-selection consistency and unbiasedness. Numerical experiments are conducted to show the sparsity-recovery ability of adversarial training under $\ell_\infty$-perturbation and to compare the empirical performance between classic adversarial training and adaptive adversarial training.
