Fast Adversarial Training against Sparse Attacks Requires Loss Smoothing
Xuyang Zhong, Yixiao Huang, Chen Liu
TL;DR
The paper addresses the challenge of fast adversarial training against sparse $l_0$-bounded perturbations, where 1-step attacks cause catastrophic overfitting due to a craggy loss landscape. It introduces Fast-LS-$l_0$, a smoothing-based approach that leverages soft labels and a trade-off loss to stabilize the adversarial objective and mitigate CO. The authors provide theoretical results showing increased non-smoothness in the $l_0$ setting and empirically validate that loss smoothing reduces CO, narrows the gap between 1-step and multi-step training, and achieves state-of-the-art robustness against sparse attacks while being significantly more efficient. The approach demonstrates practical impact by enabling competitive robustness to sparse perturbations with substantially lower computational cost than traditional multi-step adversarial training.
Abstract
This paper studies fast adversarial training against sparse adversarial perturbations bounded by $l_0$ norm. We demonstrate the challenges of employing $1$-step attacks on $l_0$ bounded perturbations for fast adversarial training, including degraded performance and the occurrence of catastrophic overfitting (CO). We highlight that CO in $l_0$ adversarial training is caused by sub-optimal perturbation locations of $1$-step attack. Theoretical and empirical analyses reveal that the loss landscape of $l_0$ adversarial training is more craggy compared to its $l_\infty$, $l_2$ and $l_1$ counterparts. Moreover, we corroborate that the craggy loss landscape can aggravate CO. To address these issues, we propose Fast-LS-$l_0$ that incorporates soft labels and the trade-off loss function to smooth the adversarial loss landscape. Extensive experiments demonstrate our method can overcome the challenge of catastrophic overfitting, achieve state-of-the-art performance, and narrow down the performance gap between $1$-step and multi-step adversarial training against sparse attacks.
