Maintaining Adversarial Robustness in Continuous Learning
Xiaolei Ru, Xiaowei Cao, Zijia Liu, Jack Murdoch Moore, Xin-Ya Zhang, Xia Zhu, Wenjia Wei, Gang Yan
TL;DR
This paper tackles the challenge of preserving adversarial robustness when a neural network learns a sequence of tasks without revisiting previous data. It introduces Double Gradient Projection (DGP), a gradient-projection framework that orthogonally constrains weight updates to stabilize prior-task gradients, and it collaborates with defense methods like input-gradient smoothing (IGR) and adversarial training (AT). By performing layer-wise SVD to derive bases that stabilize both final outputs and sample gradients, DGP maintains robustness across multiple benchmarks (Permuted MNIST, Rotated MNIST, Split-CIFAR100, Split-miniImageNet) and under attacks such as AutoAttack, PGD, and FGSM, while preserving competitive continual-learning performance. The work also reveals incompatibilities when naively combining defense methods with existing continual-learning baselines and discusses limitations related to plasticity as the base-vector pool grows, offering a principled path for robust continual learning in real-world deployments.
Abstract
Adversarial robustness is essential for security and reliability of machine learning systems. However, adversarial robustness enhanced by defense algorithms is easily erased as the neural network's weights update to learn new tasks. To address this vulnerability, it is essential to improve the capability of neural networks in terms of robust continual learning. Specially, we propose a novel gradient projection technique that effectively stabilizes sample gradients from previous data by orthogonally projecting back-propagation gradients onto a crucial subspace before using them for weight updates. This technique can maintaining robustness by collaborating with a class of defense algorithms through sample gradient smoothing. The experimental results on four benchmarks including Split-CIFAR100 and Split-miniImageNet, demonstrate that the superiority of the proposed approach in mitigating rapidly degradation of robustness during continual learning even when facing strong adversarial attacks.
