Table of Contents
Fetching ...

Learn from the Past: A Proxy Guided Adversarial Defense Framework with Self Distillation Regularization

Yaohua Liu, Jiaxin Gao, Xianghao Jiao, Zhu Liu, Xin Fan, Risheng Liu

TL;DR

This work illuminates the potential of leveraging the target model's historical states as a proxy to provide effective initialization and defense prior, which results in a general proxy guided defense framework, `LAST' (LAST from the P{\bf Learn from the P{\bf ast}).

Abstract

Adversarial Training (AT), pivotal in fortifying the robustness of deep learning models, is extensively adopted in practical applications. However, prevailing AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting. In this context, our work illuminates the potential of leveraging the target model's historical states as a proxy to provide effective initialization and defense prior, which results in a general proxy guided defense framework, `LAST' ({\bf L}earn from the P{\bf ast}). Specifically, LAST derives response of the proxy model as dynamically learned fast weights, which continuously corrects the update direction of the target model. Besides, we introduce a self-distillation regularized defense objective, ingeniously designed to steer the proxy model's update trajectory without resorting to external teacher models, thereby ameliorating the impact of catastrophic overfitting on performance. Extensive experiments and ablation studies showcase the framework's efficacy in markedly improving model robustness (e.g., up to 9.2\% and 20.3\% enhancement in robust accuracy on CIFAR10 and CIFAR100 datasets, respectively) and training stability. These improvements are consistently observed across various model architectures, larger datasets, perturbation sizes, and attack modalities, affirming LAST's ability to consistently refine both single-step and multi-step AT strategies. The code will be available at~\url{https://github.com/callous-youth/LAST}.

Learn from the Past: A Proxy Guided Adversarial Defense Framework with Self Distillation Regularization

TL;DR

This work illuminates the potential of leveraging the target model's historical states as a proxy to provide effective initialization and defense prior, which results in a general proxy guided defense framework, `LAST' (LAST from the P{\bf Learn from the P{\bf ast}).

Abstract

Adversarial Training (AT), pivotal in fortifying the robustness of deep learning models, is extensively adopted in practical applications. However, prevailing AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting. In this context, our work illuminates the potential of leveraging the target model's historical states as a proxy to provide effective initialization and defense prior, which results in a general proxy guided defense framework, `LAST' ({\bf L}earn from the P{\bf ast}). Specifically, LAST derives response of the proxy model as dynamically learned fast weights, which continuously corrects the update direction of the target model. Besides, we introduce a self-distillation regularized defense objective, ingeniously designed to steer the proxy model's update trajectory without resorting to external teacher models, thereby ameliorating the impact of catastrophic overfitting on performance. Extensive experiments and ablation studies showcase the framework's efficacy in markedly improving model robustness (e.g., up to 9.2\% and 20.3\% enhancement in robust accuracy on CIFAR10 and CIFAR100 datasets, respectively) and training stability. These improvements are consistently observed across various model architectures, larger datasets, perturbation sizes, and attack modalities, affirming LAST's ability to consistently refine both single-step and multi-step AT strategies. The code will be available at~\url{https://github.com/callous-youth/LAST}.
Paper Structure (21 sections, 2 theorems, 7 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 21 sections, 2 theorems, 7 equations, 8 figures, 6 tables, 1 algorithm.

Key Result

Lemma 1

If $\mathcal{L}_\mathrm{def}$ is $L$-smooth, the updates to the proxy model $\tilde{\omega}$ are bounded, ensuring that the sequence $\{\theta_i\}$ exhibits stable convergence behavior.

Figures (8)

  • Figure 1: Comparison of the model's adversarial loss landscape trained by original SAT methods and their improved version of LAST. We also report the gap of maxi- and minimum losses for all the landscapes with $x, y\in[-0.25,0.25]$. Note that we have also labeled the loss range for the upper and lower surfaces on the left side of the axis. The models trained by LAST exhibit significantly lower loss, smoother loss landscapes along with smaller loss gaps.
  • Figure 2: Comparison of heat map of input gradient w.r.t. the clean example $\boldsymbol{u}$ and adversarial example $\boldsymbol{u}_{\mathtt{adv}}$ between the target model $\mathcal{T}_{\boldsymbol{\theta}}$ and the introduced proxy model $\mathcal{P}_{\boldsymbol{\omega}}$. As it is shown, $\mathcal{P}_{\boldsymbol{\omega}}$ exhibits much less gradient variation in Red, Green and Blue channels. Besides, it also has less growth of loss, i.e., (c)$\rightarrow$(f) and more salient input gradient w.r.t. $\boldsymbol{u}_{\mathtt{adv}}$ around the shape of the horse compared with $\mathcal{T}_{\boldsymbol{\theta}}$, i.e., (b)$\rightarrow$(e).
  • Figure 3: Comparison of the attack and defense process between different paradigms. (a) SAT framework. (b) The LAST framework. (c) Description of the symbols. To avoid redundancy, the details of inner maximization process has been simplified in subfigure (b).
  • Figure 4: Subfigure (a) compare the convergence behavior of test loss and RA for Fast-AT and ours on CIFAR10 dataset under PGD-10 attack with $\boldsymbol{\epsilon}=16/255$. In Subfigure (b), we compare the loss landscape for Fast-AT and our version. The gap of maxi- and minimum losses is calculated within the range of $x, y\in[-0.5,0.5]$.
  • Figure 5: The four subfigures compare the convergence behavior of test robust loss and RA trained with PGD-AT and LAST, $\boldsymbol{\epsilon}=8/255$ on CIFAR10 dataset and CIFAR100 dataset. The black dashed line denotes the epoch where multi-step learning rate decays.
  • ...and 3 more figures

Theorems & Definitions (4)

  • Lemma 1
  • proof
  • Theorem 1
  • proof