Table of Contents
Fetching ...

Maintaining Adversarial Robustness in Continuous Learning

Xiaolei Ru, Xiaowei Cao, Zijia Liu, Jack Murdoch Moore, Xin-Ya Zhang, Xia Zhu, Wenjia Wei, Gang Yan

TL;DR

This paper tackles the challenge of preserving adversarial robustness when a neural network learns a sequence of tasks without revisiting previous data. It introduces Double Gradient Projection (DGP), a gradient-projection framework that orthogonally constrains weight updates to stabilize prior-task gradients, and it collaborates with defense methods like input-gradient smoothing (IGR) and adversarial training (AT). By performing layer-wise SVD to derive bases that stabilize both final outputs and sample gradients, DGP maintains robustness across multiple benchmarks (Permuted MNIST, Rotated MNIST, Split-CIFAR100, Split-miniImageNet) and under attacks such as AutoAttack, PGD, and FGSM, while preserving competitive continual-learning performance. The work also reveals incompatibilities when naively combining defense methods with existing continual-learning baselines and discusses limitations related to plasticity as the base-vector pool grows, offering a principled path for robust continual learning in real-world deployments.

Abstract

Adversarial robustness is essential for security and reliability of machine learning systems. However, adversarial robustness enhanced by defense algorithms is easily erased as the neural network's weights update to learn new tasks. To address this vulnerability, it is essential to improve the capability of neural networks in terms of robust continual learning. Specially, we propose a novel gradient projection technique that effectively stabilizes sample gradients from previous data by orthogonally projecting back-propagation gradients onto a crucial subspace before using them for weight updates. This technique can maintaining robustness by collaborating with a class of defense algorithms through sample gradient smoothing. The experimental results on four benchmarks including Split-CIFAR100 and Split-miniImageNet, demonstrate that the superiority of the proposed approach in mitigating rapidly degradation of robustness during continual learning even when facing strong adversarial attacks.

Maintaining Adversarial Robustness in Continuous Learning

TL;DR

This paper tackles the challenge of preserving adversarial robustness when a neural network learns a sequence of tasks without revisiting previous data. It introduces Double Gradient Projection (DGP), a gradient-projection framework that orthogonally constrains weight updates to stabilize prior-task gradients, and it collaborates with defense methods like input-gradient smoothing (IGR) and adversarial training (AT). By performing layer-wise SVD to derive bases that stabilize both final outputs and sample gradients, DGP maintains robustness across multiple benchmarks (Permuted MNIST, Rotated MNIST, Split-CIFAR100, Split-miniImageNet) and under attacks such as AutoAttack, PGD, and FGSM, while preserving competitive continual-learning performance. The work also reveals incompatibilities when naively combining defense methods with existing continual-learning baselines and discusses limitations related to plasticity as the base-vector pool grows, offering a principled path for robust continual learning in real-world deployments.

Abstract

Adversarial robustness is essential for security and reliability of machine learning systems. However, adversarial robustness enhanced by defense algorithms is easily erased as the neural network's weights update to learn new tasks. To address this vulnerability, it is essential to improve the capability of neural networks in terms of robust continual learning. Specially, we propose a novel gradient projection technique that effectively stabilizes sample gradients from previous data by orthogonally projecting back-propagation gradients onto a crucial subspace before using them for weight updates. This technique can maintaining robustness by collaborating with a class of defense algorithms through sample gradient smoothing. The experimental results on four benchmarks including Split-CIFAR100 and Split-miniImageNet, demonstrate that the superiority of the proposed approach in mitigating rapidly degradation of robustness during continual learning even when facing strong adversarial attacks.
Paper Structure (30 sections, 18 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 30 sections, 18 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: Feeding data $\mathbf{X}_{p}$ into an exemplar neural network after learning task $\mathcal{T}_{p}$ and $\mathcal{T}_{t} \left ( p < t \right )$ respectively. $\Delta \mathbf{W}_{p,t}^{l}$ denotes the change of weights in task $\mathcal{T}_{t}$ relative to task $\mathcal{T}_{p}$. If $\Delta \mathbf{W}_{p,t}^{1}$ meets the constraint $\mathbf{X}_{p} \Delta \mathbf{W}_{p,t}^{1} =0$, then $\mathbf{X}_{p,p}^{2}$ is equal to $\mathbf{X}_{p,t}^{2}$. Recursively, the final outputs $\hat{\mathbf{Y}} _{p,t}$ and $\hat{\mathbf{Y}} _{p,p}$ will be identical even the weights of the neural network are updated. More, if $\Delta \mathbf{W}_{p,t}^{l}$ meets another constraint $\frac{\partial\mathbf{X} ^{l}_{p,t} }{\partial\mathbf{X}_{p}} \Delta\mathbf{W}_{p,t}^{l} = 0$, the sample gradients $\frac{\partial\hat{\mathbf{Y}} _{p,t} }{\partial\mathbf{X}_{p}}$ and $\frac{\partial\hat{\mathbf{Y}} _{p,p} }{\partial\mathbf{X}_{p}}$ will be identical.
  • Figure 2: Graphical representation illustrating the imposed constraints in DGP. (a) The $\mathbf{X}^{l}$ or $\frac{d\mathbf{X} ^{l} }{d\mathbf{X}}$ is approximated by $\mathbf{U}^{l}_{k}\mathbf{\Lambda}^{l}_{k} \left ( \mathbf{V}^{l}_{k} \right ) ^{\mathrm{T}}$. (b) Multiplication of $\left ( \mathbf{V}^{l}_{k} \right ) ^{\mathrm{T}}$ with $\Delta\mathbf{W}^{l}$ being zero implies that multiplication of $\mathbf{X}^{l}$ (or $\frac{\partial\mathbf{X} ^{l} }{\partial\mathbf{X}}$) with $\Delta\mathbf{W}^{l}$ is approximately zero. Consequently, weight updates $\Delta\mathbf{W}^{l}$ have little impact on $\mathbf{X}^{l+1}$ (or $\frac{\partial\mathbf{X} ^{l+1} }{\partial\mathbf{X}}$) of previous tasks.
  • Figure 3: ACC varying with the number of learned tasks on datasets of Permuted MNIST (first row), Rotated MNIST (second row), CIFAR100 (third row) and miniImageNet (fourth row). ACC is measured on adversarial samples generated by AutoAttack (first column), PGD (second column) and FGSM (third column), as well as original samples (fourth column). The horizontal axis indicates the number of tasks learned by the neural network at present. The defense algorithm used here is IGR. Errors bars denote standard deviation.
  • Figure 4: As Fig. \ref{['fig3']}, but for defense algorithm Adversarial Training (AT) on PMNIST dataset. Here, we combine AT with continual learning methods GEM and GPM, which have shown superior ACC compared to other baselines in Fig. \ref{['fig3']}.
  • Figure 5: Gradient variation of samples from the first task $\mathcal{T} _{1}$ during continuous learning process trained with IGR. The variations are quantified through similarity.
  • ...and 3 more figures