Table of Contents
Fetching ...

Barriers for Learning in an Evolving World: Mathematical Understanding of Loss of Plasticity

Amir Joudaki, Giulia Lanzillotta, Mohammad Samragh Razlighi, Iman Mirzadeh, Keivan Alizadeh, Thomas Hofmann, Mehrdad Farajtabar, Fartash Faghri

TL;DR

The paper presents a first-principles dynamical-systems framework for Loss of Plasticity (LoP) in gradient-based continual learning, defining LoP via stable manifolds in parameter space that trap optimization trajectories. It identifies two core trapping mechanisms—frozen units from activation saturation and cloned-unit manifolds from representational redundancy—and shows that low-dimensional, low-rank representations common in static generalization can impede adaptation to non-stationary tasks. A rank-dynamics perspective links nonlinear activations to increases in effective rank and to the emergence of LoP symptoms, with a formal theorem describing how nonlinearities drive rank gains and promote feature cloning or dead units. The study validates the theory through experiments across MLPs, CNNs, ResNets, and ViTs on continual Tiny ImageNet tasks, and demonstrates mitigation via normalization and escape through perturbations like Noisy SGD or Dropout, including Continual Backpropagation (CBP) in non-stationary settings. The findings highlight a tension between static generalization biases and continual adaptability, offering a mathematical grounding for designing architectures and training procedures that preserve plasticity in evolving environments.

Abstract

Deep learning models excel in stationary data but struggle in non-stationary environments due to a phenomenon known as loss of plasticity (LoP), the degradation of their ability to learn in the future. This work presents a first-principles investigation of LoP in gradient-based learning. Grounded in dynamical systems theory, we formally define LoP by identifying stable manifolds in the parameter space that trap gradient trajectories. Our analysis reveals two primary mechanisms that create these traps: frozen units from activation saturation and cloned-unit manifolds from representational redundancy. Our framework uncovers a fundamental tension: properties that promote generalization in static settings, such as low-rank representations and simplicity biases, directly contribute to LoP in continual learning scenarios. We validate our theoretical analysis with numerical simulations and explore architectural choices or targeted perturbations as potential mitigation strategies.

Barriers for Learning in an Evolving World: Mathematical Understanding of Loss of Plasticity

TL;DR

The paper presents a first-principles dynamical-systems framework for Loss of Plasticity (LoP) in gradient-based continual learning, defining LoP via stable manifolds in parameter space that trap optimization trajectories. It identifies two core trapping mechanisms—frozen units from activation saturation and cloned-unit manifolds from representational redundancy—and shows that low-dimensional, low-rank representations common in static generalization can impede adaptation to non-stationary tasks. A rank-dynamics perspective links nonlinear activations to increases in effective rank and to the emergence of LoP symptoms, with a formal theorem describing how nonlinearities drive rank gains and promote feature cloning or dead units. The study validates the theory through experiments across MLPs, CNNs, ResNets, and ViTs on continual Tiny ImageNet tasks, and demonstrates mitigation via normalization and escape through perturbations like Noisy SGD or Dropout, including Continual Backpropagation (CBP) in non-stationary settings. The findings highlight a tension between static generalization biases and continual adaptability, offering a mathematical grounding for designing architectures and training procedures that preserve plasticity in evolving environments.

Abstract

Deep learning models excel in stationary data but struggle in non-stationary environments due to a phenomenon known as loss of plasticity (LoP), the degradation of their ability to learn in the future. This work presents a first-principles investigation of LoP in gradient-based learning. Grounded in dynamical systems theory, we formally define LoP by identifying stable manifolds in the parameter space that trap gradient trajectories. Our analysis reveals two primary mechanisms that create these traps: frozen units from activation saturation and cloned-unit manifolds from representational redundancy. Our framework uncovers a fundamental tension: properties that promote generalization in static settings, such as low-rank representations and simplicity biases, directly contribute to LoP in continual learning scenarios. We validate our theoretical analysis with numerical simulations and explore architectural choices or targeted perturbations as potential mitigation strategies.

Paper Structure

This paper contains 53 sections, 10 theorems, 23 equations, 18 figures.

Key Result

Theorem 2.1

Let $G=(V,E)$ be the network’s computational DAG and let $\theta=\{\theta_{uv}:(u\to v)\in E\}\in\Theta$ denote the edge parameters.

Figures (18)

  • Figure 2.1: Cloning MLPs experiments. The empirical data validates \ref{['prop:frozen_duplicate_lop']} on duplicate manifold LoP. The cloned network dynamics remain confined in the base network manifold when using SGD, however using Noisy SGD or Dropout the dynamics can escape the manifold. Left: Cloning $R^2$ score quantifies the proportion of variance in individual unit activations within a cloned block that is explained by the mean activation of that block. An $R^2$ score of 1 indicates perfect cloning (units in a block are nearly identical), while a 0 score indicates no explained variance. See \ref{['subsec:core_methodologies']} (\ref{['app:empirical_evidence_appendix']}) for the precise formula and calculation details. Middle: Training loss comparison. Cloned loss refers to the loss of the cloned model during its training phase, while base loss refers to the loss of the original base model, which continues training for comparison. Right: Effective rank evolution showing representational diversity.
  • Figure 3.1: Causes and symptoms of Loss of Plasticity emerging during continual learning. The plots illustrate (across different architectures like MLP, CNN, ResNet, and ViT from left to right) an increase in the fraction of dead or duplicate units during training, coincidental with a decrease in training accuracy. These are key indicators of LoP. (Details of experimental setup in \ref{['app:empirical_evidence_appendix']}).
  • Figure 3.2: Co-evolution of Effective rank and LoP symptoms, such as dead or duplicate units in the network during continual training. (Experimental details in \ref{['app:empirical_evidence_appendix']}).
  • Figure 4.1: Evolution of the Effective rank during training for architectures with and without normalization layers. Dotted lines represent normalization with affine parameters. (Experimental details in \ref{['app:empirical_evidence_appendix']}).
  • Figure B.1: Bit Flipping experiment on 5M samples, switching from SGD to CBP at 2.5M samples. Low rank structures emerge during training with standard Backpropagation (SGD), but after the switch Continual Backpropagation (CBP) is able to recover representational diversity, suggesting that CBP-like training could be effective for cloning too. (Experimental details in \ref{['app:empirical_evidence_appendix']}).
  • ...and 13 more figures

Theorems & Definitions (30)

  • Definition 2.1: LoP Manifold
  • Remark 2.1
  • Theorem 2.1
  • Remark 2.2
  • Remark 2.3
  • Theorem 2.2: Modular Cloning (informal)
  • Theorem 3.1: rank gain across one linear–nonlinear step
  • Lemma A.1: basic properties of $K_\phi$
  • proof
  • Lemma A.2: entrywise action on Gaussian correlation matrices
  • ...and 20 more