Table of Contents
Fetching ...

On the Occurence of Critical Learning Periods in Neural Networks

Stanisław Pawlak

TL;DR

The paper investigates critical learning periods in neural network training and their relation to warm-starting strategies. By replicating and extending the core experiments of Achille et al. on a ResNet-18 trained on CIFAR-10 with CIFAR-C corruptions, the authors show that deficits in early training reduce plasticity, but a cyclic learning-rate schedule can restore adaptability and erase most performance gaps. They reinterpret warm-starting as deficit pretraining, demonstrate the impact of corruption severity on the learning period, and reveal that targeted deficits induce class-dependent forgetting. The study connects plasticity phenomena with practical training dynamics, offering actionable guidance for avoiding loss of plasticity during continual learning.

Abstract

This study delves into the plasticity of neural networks, offering empirical support for the notion that critical learning periods and warm-starting performance loss can be avoided through simple adjustments to learning hyperparameters. The critical learning phenomenon emerges when training is initiated with deficit data. Subsequently, after numerous deficit epochs, the network's plasticity wanes, impeding its capacity to achieve parity in accuracy with models trained from scratch, even when extensive clean data training follows deficit epochs. Building upon seminal research introducing critical learning periods, we replicate key findings and broaden the experimental scope of the main experiment from the original work. In addition, we consider a warm-starting approach and show that it can be seen as a form of deficit pretraining. In particular, we demonstrate that these problems can be averted by employing a cyclic learning rate schedule. Our findings not only impact neural network training practices but also establish a vital link between critical learning periods and ongoing research on warm-starting neural network training.

On the Occurence of Critical Learning Periods in Neural Networks

TL;DR

The paper investigates critical learning periods in neural network training and their relation to warm-starting strategies. By replicating and extending the core experiments of Achille et al. on a ResNet-18 trained on CIFAR-10 with CIFAR-C corruptions, the authors show that deficits in early training reduce plasticity, but a cyclic learning-rate schedule can restore adaptability and erase most performance gaps. They reinterpret warm-starting as deficit pretraining, demonstrate the impact of corruption severity on the learning period, and reveal that targeted deficits induce class-dependent forgetting. The study connects plasticity phenomena with practical training dynamics, offering actionable guidance for avoiding loss of plasticity during continual learning.

Abstract

This study delves into the plasticity of neural networks, offering empirical support for the notion that critical learning periods and warm-starting performance loss can be avoided through simple adjustments to learning hyperparameters. The critical learning phenomenon emerges when training is initiated with deficit data. Subsequently, after numerous deficit epochs, the network's plasticity wanes, impeding its capacity to achieve parity in accuracy with models trained from scratch, even when extensive clean data training follows deficit epochs. Building upon seminal research introducing critical learning periods, we replicate key findings and broaden the experimental scope of the main experiment from the original work. In addition, we consider a warm-starting approach and show that it can be seen as a form of deficit pretraining. In particular, we demonstrate that these problems can be averted by employing a cyclic learning rate schedule. Our findings not only impact neural network training practices but also establish a vital link between critical learning periods and ongoing research on warm-starting neural network training.

Paper Structure

This paper contains 11 sections, 12 figures.

Figures (12)

  • Figure 1: Restarting learning rate alleviate plasticity problems after deficit training epochs. a) Restarting learning rate after deficit epochs almost closes the performance gap between deficit trained model and the model trained on clean data from scratch. b) It closes the performance gap when we use pretraining on a small subset of data (warm-starting) as a deficit during deficit epochs training. Image above the plot reproduced based on the original work achille2019criticallearningperiodsdeep.
  • Figure 2: Final model accuracy correlates with learning rate size after deficit epochs (restart lr). In both cases the bigger the restarting lr, the better the final result. Also, the results are worse in situations where original learning rate is bigger than restarted after the last deficit epoch (e.g. 20 and 40 epoch for lr=0.001).
  • Figure 3: Corruption severities correlates with performance gap size. Bigger gap and higher sensitivity to corruption level is observed for some standardized corruptions types (pixelate $>$ gaussian blur $>$ gaussian noise).
  • Figure 4: Smaller subset size in deficit training increases the performance gap.
  • Figure 8: Differences in confusion matrices for runs with deficit data only for 1,2,5,10 classes. Deficit classes are on the left and up to the red lines. Clean class examples are frequently misclassified as belonging to deficit-pretrained classes.
  • ...and 7 more figures