Table of Contents
Fetching ...

Fast Adversarial Training against Sparse Attacks Requires Loss Smoothing

Xuyang Zhong, Yixiao Huang, Chen Liu

TL;DR

The paper addresses the challenge of fast adversarial training against sparse $l_0$-bounded perturbations, where 1-step attacks cause catastrophic overfitting due to a craggy loss landscape. It introduces Fast-LS-$l_0$, a smoothing-based approach that leverages soft labels and a trade-off loss to stabilize the adversarial objective and mitigate CO. The authors provide theoretical results showing increased non-smoothness in the $l_0$ setting and empirically validate that loss smoothing reduces CO, narrows the gap between 1-step and multi-step training, and achieves state-of-the-art robustness against sparse attacks while being significantly more efficient. The approach demonstrates practical impact by enabling competitive robustness to sparse perturbations with substantially lower computational cost than traditional multi-step adversarial training.

Abstract

This paper studies fast adversarial training against sparse adversarial perturbations bounded by $l_0$ norm. We demonstrate the challenges of employing $1$-step attacks on $l_0$ bounded perturbations for fast adversarial training, including degraded performance and the occurrence of catastrophic overfitting (CO). We highlight that CO in $l_0$ adversarial training is caused by sub-optimal perturbation locations of $1$-step attack. Theoretical and empirical analyses reveal that the loss landscape of $l_0$ adversarial training is more craggy compared to its $l_\infty$, $l_2$ and $l_1$ counterparts. Moreover, we corroborate that the craggy loss landscape can aggravate CO. To address these issues, we propose Fast-LS-$l_0$ that incorporates soft labels and the trade-off loss function to smooth the adversarial loss landscape. Extensive experiments demonstrate our method can overcome the challenge of catastrophic overfitting, achieve state-of-the-art performance, and narrow down the performance gap between $1$-step and multi-step adversarial training against sparse attacks.

Fast Adversarial Training against Sparse Attacks Requires Loss Smoothing

TL;DR

The paper addresses the challenge of fast adversarial training against sparse -bounded perturbations, where 1-step attacks cause catastrophic overfitting due to a craggy loss landscape. It introduces Fast-LS-, a smoothing-based approach that leverages soft labels and a trade-off loss to stabilize the adversarial objective and mitigate CO. The authors provide theoretical results showing increased non-smoothness in the setting and empirically validate that loss smoothing reduces CO, narrows the gap between 1-step and multi-step training, and achieves state-of-the-art robustness against sparse attacks while being significantly more efficient. The approach demonstrates practical impact by enabling competitive robustness to sparse perturbations with substantially lower computational cost than traditional multi-step adversarial training.

Abstract

This paper studies fast adversarial training against sparse adversarial perturbations bounded by norm. We demonstrate the challenges of employing -step attacks on bounded perturbations for fast adversarial training, including degraded performance and the occurrence of catastrophic overfitting (CO). We highlight that CO in adversarial training is caused by sub-optimal perturbation locations of -step attack. Theoretical and empirical analyses reveal that the loss landscape of adversarial training is more craggy compared to its , and counterparts. Moreover, we corroborate that the craggy loss landscape can aggravate CO. To address these issues, we propose Fast-LS- that incorporates soft labels and the trade-off loss function to smooth the adversarial loss landscape. Extensive experiments demonstrate our method can overcome the challenge of catastrophic overfitting, achieve state-of-the-art performance, and narrow down the performance gap between -step and multi-step adversarial training against sparse attacks.

Paper Structure

This paper contains 33 sections, 6 theorems, 36 equations, 6 figures, 13 tables, 3 algorithms.

Key Result

Theorem 4.2

(Lipschitz continuity of adversarial loss) If Assumption assum_lip holds, we have: The constant $A_{{\bm{\theta}}} = 2\sum_{i\in\mathcal{S}_+}y_iL_{\bm{\theta}}$ where $\mathcal{S}_+=\{i~|~y_i\geq 0, h_i({\bm{x}}+{\boldsymbol{\delta}}_1, {\bm{\theta}}_2)>h_i({\bm{x}}+{\boldsymbol{\delta}}_1, {\bm{\theta}}_1)\}$, ${\boldsymbol{\delta}}_1 \in \mathop{\mathrm{arg\,max}}\limits_{{\bo

Figures (6)

  • Figure 1: The learning curves of adversarial training against $1$-step sPGD with random noise initialization. The models are PreactResNet-18 trained on CIFAR-10. The dashed and the solid lines represent the accuracy of the training and the test set, respectively. The test robust accuracy is based on sAA with $\epsilon = 20$. The values of $\epsilon$ used in training are shown as $\epsilon_{train}$, the training robust accuracy is based on the $1$-step sPGD with $\epsilon_{train}$.
  • Figure 2: Visualization of location difference and location overlapping. (a) The distribution of the normalized $l_0$ distance between training adversarial examples generated by 1-step sPGD and sAA. The models trained on $20$-step sAT with different training $\epsilon$ are evaluated. (b) The distribution of the location overlapping rate between the perturbations generated by attacks used in training ($20$-step sPGD) and test (sAA), where $\epsilon_{test}=20$. The models trained on $20$-step sAT with different training $\epsilon$ are evaluated.
  • Figure 3: Smoothness of adversarial loss objective functions. All losses are calculated on the training set of CIFAR-10 by PreactResNet-18. The $l_0$, $l_1$, $l_2$ and $l_\infty$ models are obtained by $1$-step sAT zhong2024efficient, Fast-EG-$l_1$jiang2023towards, $1$-step PGD rice2020overfitting and GradAlign andriushchenko2020square, respectively. (a) Top $10$ eigenvalues of $\nabla_{{\bm{\theta}}}^2\mathcal{L}_{\epsilon}^{(0)}({\bm{x}}, {\bm{\theta}})$ with different values of $\epsilon_{train}$ in the $l_0$ case. (b) Top $10$ eigenvalues of $\nabla_{{\bm{\theta}}}^2\mathcal{L}_{\epsilon}^{(p)}({\bm{x}}, {\bm{\theta}})$ under different choices of $p$, including $0$, $1$, $2$ and $\infty$. The y-axis is shown on a log scale. (c) - (f) The loss landscape of $\mathcal{L}_{\epsilon}({\bm{x}}, {\bm{\theta}}+\alpha_1{\bm{v}}_1+\alpha_2{\bm{v}}_2)$ where ${\bm{v}}_1$ and ${\bm{v}}_2$ are the eigenvectors associated with the top $2$ eigenvalues of $\nabla_{{\bm{\theta}}}^2\mathcal{L}_{\epsilon}({\bm{x}}, {\bm{\theta}})$, respectively. (c)$l_0$ case, $\epsilon_{train} = 1$. (d)$l_1$ case, $\epsilon_{train}=24$. (e)$l_2$ case, $\epsilon_{train}=0.5$. (f)$l_\infty$ case, $\epsilon_{train}=8/255$.
  • Figure 4: Relationship between craggy loss landscape and CO. (a) Gradient norm $\|\nabla_{{\bm{\theta}}_{t}}\mathcal{L}_{\epsilon}\|_2$. (b) Test robust accuracy against sAA ($\epsilon=20$). The results are obtained from PreactResNet-18 trained on CIFAR-10 with $\epsilon_{train}=40$. Since the training of $20$-step sAT w/o ES diverges under $\epsilon_{train}=120$, the results are presented under $\epsilon_{train}=40$ instead.
  • Figure 5: Loss landscape of $1$-step sAT zhong2024efficient with different $\epsilon$ values on the training set of CIFAR-10 krizhevsky2009learning. The architecture of the model is PreactResNet-18. (a) Landscape of $\mathcal{L}_{\epsilon}^{(0)}({\bm{x}}, {\bm{\theta}}+\alpha_1{\bm{v}}_1+\alpha_2{\bm{v}}_2)$ with $\epsilon=20$, where ${\bm{v}}_1$ and ${\bm{v}}_2$ are the eigenvectors corresponding to the top 2 eigenvalues of the Hessian matrices, respectively. (b) Landscape of $\mathcal{L}_{\epsilon}^{(0)}$ with $\epsilon=40$. (c) Landscape of $\mathcal{L}_{\epsilon}^{(0)}$ with $\epsilon=120$.
  • ...and 1 more figures

Theorems & Definitions (10)

  • Theorem 4.2
  • Theorem 4.4
  • proof
  • proof
  • Proposition C.1
  • proof
  • Proposition C.2
  • proof
  • Corollary D.2
  • Corollary D.3