Table of Contents
Fetching ...

On the Implicit Adversariality of Catastrophic Forgetting in Deep Continual Learning

Ze Peng, Jian Zhang, Jintao Guo, Lei Qi, Yang Gao, Yinghuan Shi

TL;DR

This work reveals that catastrophic forgetting in deep continual learning is driven by an implicit adversariality where new-task updates align with old-task high-curvature directions. Depth and a low-rank bias in old-task weights funnel forward and backward propagations into a shared low-dimensional subspace, enabling persistent alignment and rapid forgetting. Gradient Projection methods mitigate forward alignment but leave backward alignment unaddressed; the authors propose backGP to constrain backward updates, achieving substantial improvements across standard CL benchmarks and further gains when combined with plasticity-enhancing regularizers. The findings connect continual learning with adversarial robustness and offer practical strategies and theoretical insight for transfer learning and foundation-model fine-tuning scenarios.

Abstract

Continual learning seeks the human-like ability to accumulate new skills in machine intelligence. Its central challenge is catastrophic forgetting, whose underlying cause has not been fully understood for deep networks. In this paper, we demystify catastrophic forgetting by revealing that the new-task training is implicitly an adversarial attack against the old-task knowledge. Specifically, the new-task gradients automatically and accurately align with the sharp directions of the old-task loss landscape, rapidly increasing the old-task loss. This adversarial alignment is intriguingly counter-intuitive because the sharp directions are too sparsely distributed to align with by chance. To understand it, we theoretically show that it arises from training's low-rank bias, which, through forward and backward propagation, confines the two directions into the same low-dimensional subspace, facilitating alignment. Gradient projection (GP) methods, a representative family of forgetting-mitigating methods, reduce adversarial alignment caused by forward propagation, but cannot address the alignment due to backward propagation. We propose backGP to address it, which reduces forgetting by 10.8% and improves accuracy by 12.7% on average over GP methods.

On the Implicit Adversariality of Catastrophic Forgetting in Deep Continual Learning

TL;DR

This work reveals that catastrophic forgetting in deep continual learning is driven by an implicit adversariality where new-task updates align with old-task high-curvature directions. Depth and a low-rank bias in old-task weights funnel forward and backward propagations into a shared low-dimensional subspace, enabling persistent alignment and rapid forgetting. Gradient Projection methods mitigate forward alignment but leave backward alignment unaddressed; the authors propose backGP to constrain backward updates, achieving substantial improvements across standard CL benchmarks and further gains when combined with plasticity-enhancing regularizers. The findings connect continual learning with adversarial robustness and offer practical strategies and theoretical insight for transfer learning and foundation-model fine-tuning scenarios.

Abstract

Continual learning seeks the human-like ability to accumulate new skills in machine intelligence. Its central challenge is catastrophic forgetting, whose underlying cause has not been fully understood for deep networks. In this paper, we demystify catastrophic forgetting by revealing that the new-task training is implicitly an adversarial attack against the old-task knowledge. Specifically, the new-task gradients automatically and accurately align with the sharp directions of the old-task loss landscape, rapidly increasing the old-task loss. This adversarial alignment is intriguingly counter-intuitive because the sharp directions are too sparsely distributed to align with by chance. To understand it, we theoretically show that it arises from training's low-rank bias, which, through forward and backward propagation, confines the two directions into the same low-dimensional subspace, facilitating alignment. Gradient projection (GP) methods, a representative family of forgetting-mitigating methods, reduce adversarial alignment caused by forward propagation, but cannot address the alignment due to backward propagation. We propose backGP to address it, which reduces forgetting by 10.8% and improves accuracy by 12.7% on average over GP methods.

Paper Structure

This paper contains 39 sections, 24 theorems, 146 equations, 8 figures, 3 tables.

Key Result

Proposition 1

Let $\bm{A}$ and $\bm{B}$ be two symmetric PSD matrices. Then Let $\left\{\bm{A}_i\right\}$ be a sequence of symmetric PSD matrices. Then

Figures (8)

  • Figure 1: Illustration of adversarial alignment's definition, influence, counter-intuitiveness, cause and mitigation.\ref{['fig:illustration_alignment']} illustrates the definition of the alignment using an example of aligned new-task updates, which is contrasted with unaligned updates. \ref{['fig:illustration_alignment']} also illustrates that the aligned new-task updates lead to large old-task loss increase, i.e., catastrophic forgetting, while unaligned new-task updates do not. \ref{['fig:illustration_adversarial']} shows the expectation of intuitive preliminary analysis, i.e., the new-task updates and the sparsely distributed old-task high-curvature directions should not persistently align in the high-dimensional weight space. The intuitive expectation mismatches the reality, indicating the alignment is counter-intuitive. \ref{['fig:illustration_causes']} illustrates the cause of adversarial alignment, i.e., both directions have low-rank Jacobian $\bm{J}$ as a common factor, which confines them to the samelow-dimensional subspace (column space of $\bm{J}$), where alignment is much easier. \ref{['fig:illustration_mitigation_of_aa']} illustrates how existing GP methods and our backGP methods mitigate the adversarial alignment (at least for deep linear networks). Details can be found in \ref{['sec:analysis_of_existing_methods', 'sec:resolution_of_limitations']}.
  • Figure 2: The empirical evidence of adversarial alignment. Cumulative distribution functions (CDFs, top) show that the projection of new-task updates is disproportionately high onto high-curvature directions of old tasks across datasets and architectures, while random perturbations do not. Box plots (bottom) track the persistence of this alignment during the early steps of new-task training. Results are shown for (a) CIFAR-100 (10-split), (b) randomly rotated whitened MNIST (synthetic), and (c) cross-modal CL (old task: first split of 10-split CIFAR100 for image classification, new task: SST2 for sentimental analysis). See \ref{['sec:verification']} for full details. We observe the new-task update has a large projection onto the eigenvectors of large curvatures $\sim 10^{0}$ compared to the baseline, even though such directions are sparse (see the baseline's flat CDF) and the tasks have different data (e.g., cross-modal).
  • Figure 3: Connection between adversarial alignment and forgetting. We present the various forgetting (old-task loss increase) recorded during the experiments in \ref{['figure:existence']}. Actual forgetting (black) rises sharply with new-task training. Its second-order approximation (green) can capture this rise especially at initial new-task training, while random perturbations (orange) induce negligible forgetting. First-order approximations (blue) capture little of the effect or even predict negative forgetting. Average results over 5 runs are reported. The experimental settings are the same as \ref{['figure:existence']}, and the results are also recorded in the experiment for \ref{['figure:existence']}. Full details can be found in \ref{['sec:verification']}.
  • Figure 4: Verification of adversarial alignment lower-bounds.\ref{['figure:verification_tightness']} shows the correlation between the lower-bound and the estimated $\alpha$ in each experiment. It shows the lower-bound (1) is lower than the estimated $\alpha$, (2) is well correlated with the actual $\alpha$, and (3) is tight up to constant factors within the scope of the experiments. \ref{['figure:phase_transition_1', 'figure:phase_transition_2', 'figure:phase_transition_46810']} verify the phase transition predicted by the lower-bound. The experiments are conducted on the whitened MNIST dataset with random rotation of the old task as the new task. The rank is controlled by taking labels modulo rank $r$. When $L=1$, the alignment is not related to the rank of $\bm{\Phi}_1$. When $L\ge 2$, the alignment is inversely proportional to the rank of $\bm{\Phi}_1$. For each depth-rank configuration, we run experiments 5 times. The 10-rank results are recorded in experiment for \ref{['figure:existence_randrot_mnist']}.
  • Figure 5: Effectiveness of algorithms through the lens of \ref{['eq:alignment_and_forgetting']}. We run vanilla training (without any forgetting mitigation), AdamNSCL with various regularizers or with backGP on 10-split CIFAR100. At each task, we compute the adversarial alignment, weight difference, and Hessian with/of old tasks. Two kinds of old tasks are considered, i.e., the one before the current task, and the most forgotten (with the most loss increase) task $2$. Data are recorded every 4000 steps. Regarding adversarial alignment, we observe (1) AdamNSCL reduces adversarial alignment compared to the vanilla method; (2) even if AdamNSCL is used, adversarial alignment still exists, i.e., residual adversariality; (3) spectral regularization also reduces adversarial alignment but leaves residual adversariality; (4) backGP further reduces the residual adversarial alignment. Regarding the other two factors, we observe (1) spectral regularization reduces all update norms as while as Hessian traces of tasks $6 \sim 10$, while forward or backward GPs do not change them drastically; (2) all tested methods affect early tasks' Hessian traces in the inverse way as the alignment, leading to a tendency to increase forgetting. Through the lens of \ref{['eq:alignment_and_forgetting']}, we conclude that forward and our backward GPs mitigate forgetting exactly by reducing the adversarial alignment, instead of affecting the two other factors. Results in these figures are recorded during the experiment of \ref{['table:comparison']}.
  • ...and 3 more figures

Theorems & Definitions (49)

  • Definition 1
  • Definition 2
  • Proposition 1
  • Proposition 2
  • proof
  • Lemma 1: von Neumann's trace inequality majorization
  • Lemma 2
  • proof
  • Corollary 3
  • proof
  • ...and 39 more