Table of Contents
Fetching ...

Neural Network Plasticity and Loss Sharpness

Max Koster, Jude Kukla

TL;DR

This work investigates whether sharpness-based regularization can mitigate plasticity loss in continual learning under non-stationary tasks. By evaluating SAM and Gradient Norm Penalty on permuted MNIST in both domain- and class-incremental settings, the study finds that these techniques do not reduce plasticity loss and can even worsen it in some cases. Across 100 domain tasks and 45 class tasks, SGD generally maintains stability, while SAM often degrades performance and GNP shows potential in class-incremental scenarios but not universally. The results challenge the notion that sharpness regularization alone suffices for non-stationary continual learning and motivate testing on more diverse benchmarks and RL contexts, possibly in combination with other CL strategies.

Abstract

In recent years, continual learning, a prediction setting in which the problem environment may evolve over time, has become an increasingly popular research field due to the framework's gearing towards complex, non-stationary objectives. Learning such objectives requires plasticity, or the ability of a neural network to adapt its predictions to a different task. Recent findings indicate that plasticity loss on new tasks is highly related to loss landscape sharpness in non-stationary RL frameworks. We explore the usage of sharpness regularization techniques, which seek out smooth minima and have been touted for their generalization capabilities in vanilla prediction settings, in efforts to combat plasticity loss. Our findings indicate that such techniques have no significant effect on reducing plasticity loss.

Neural Network Plasticity and Loss Sharpness

TL;DR

This work investigates whether sharpness-based regularization can mitigate plasticity loss in continual learning under non-stationary tasks. By evaluating SAM and Gradient Norm Penalty on permuted MNIST in both domain- and class-incremental settings, the study finds that these techniques do not reduce plasticity loss and can even worsen it in some cases. Across 100 domain tasks and 45 class tasks, SGD generally maintains stability, while SAM often degrades performance and GNP shows potential in class-incremental scenarios but not universally. The results challenge the notion that sharpness regularization alone suffices for non-stationary continual learning and motivate testing on more diverse benchmarks and RL contexts, possibly in combination with other CL strategies.

Abstract

In recent years, continual learning, a prediction setting in which the problem environment may evolve over time, has become an increasingly popular research field due to the framework's gearing towards complex, non-stationary objectives. Learning such objectives requires plasticity, or the ability of a neural network to adapt its predictions to a different task. Recent findings indicate that plasticity loss on new tasks is highly related to loss landscape sharpness in non-stationary RL frameworks. We explore the usage of sharpness regularization techniques, which seek out smooth minima and have been touted for their generalization capabilities in vanilla prediction settings, in efforts to combat plasticity loss. Our findings indicate that such techniques have no significant effect on reducing plasticity loss.
Paper Structure (15 sections, 6 equations, 5 figures, 2 tables)

This paper contains 15 sections, 6 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Blue and green regions represents set of parameters $\theta$ with high performance for two tasks $A$ and $B$, respectively. 1: Model learns parameter $\textbf{x}$ for task $A$ & achieves high accuracy 2a: Following the introduction of $B$, model learns some $\textbf{x}$ that produces high performance on $B$ with poor performance on $A$. 2b: Model maintains performance on $A$ but obtains poor performance on $B$. 2c: Model learns $\textbf{x}$ with good performance on both $A$ and $B$.
  • Figure 2: Nine class-incremental learning tasks
  • Figure 3: One domain-incremental learning task
  • Figure 4: Task-specific accuracies in the domain-incremental learning problem under different loss minimization schema (10 runs)
  • Figure 5: Task-specific accuracies in the class-incremental learning problem under different loss minimization schema (10 runs)