Barriers for Learning in an Evolving World: Mathematical Understanding of Loss of Plasticity
Amir Joudaki, Giulia Lanzillotta, Mohammad Samragh Razlighi, Iman Mirzadeh, Keivan Alizadeh, Thomas Hofmann, Mehrdad Farajtabar, Fartash Faghri
TL;DR
The paper presents a first-principles dynamical-systems framework for Loss of Plasticity (LoP) in gradient-based continual learning, defining LoP via stable manifolds in parameter space that trap optimization trajectories. It identifies two core trapping mechanisms—frozen units from activation saturation and cloned-unit manifolds from representational redundancy—and shows that low-dimensional, low-rank representations common in static generalization can impede adaptation to non-stationary tasks. A rank-dynamics perspective links nonlinear activations to increases in effective rank and to the emergence of LoP symptoms, with a formal theorem describing how nonlinearities drive rank gains and promote feature cloning or dead units. The study validates the theory through experiments across MLPs, CNNs, ResNets, and ViTs on continual Tiny ImageNet tasks, and demonstrates mitigation via normalization and escape through perturbations like Noisy SGD or Dropout, including Continual Backpropagation (CBP) in non-stationary settings. The findings highlight a tension between static generalization biases and continual adaptability, offering a mathematical grounding for designing architectures and training procedures that preserve plasticity in evolving environments.
Abstract
Deep learning models excel in stationary data but struggle in non-stationary environments due to a phenomenon known as loss of plasticity (LoP), the degradation of their ability to learn in the future. This work presents a first-principles investigation of LoP in gradient-based learning. Grounded in dynamical systems theory, we formally define LoP by identifying stable manifolds in the parameter space that trap gradient trajectories. Our analysis reveals two primary mechanisms that create these traps: frozen units from activation saturation and cloned-unit manifolds from representational redundancy. Our framework uncovers a fundamental tension: properties that promote generalization in static settings, such as low-rank representations and simplicity biases, directly contribute to LoP in continual learning scenarios. We validate our theoretical analysis with numerical simulations and explore architectural choices or targeted perturbations as potential mitigation strategies.
