Neural Network Plasticity and Loss Sharpness

Max Koster; Jude Kukla

Neural Network Plasticity and Loss Sharpness

Max Koster, Jude Kukla

TL;DR

This work investigates whether sharpness-based regularization can mitigate plasticity loss in continual learning under non-stationary tasks. By evaluating SAM and Gradient Norm Penalty on permuted MNIST in both domain- and class-incremental settings, the study finds that these techniques do not reduce plasticity loss and can even worsen it in some cases. Across 100 domain tasks and 45 class tasks, SGD generally maintains stability, while SAM often degrades performance and GNP shows potential in class-incremental scenarios but not universally. The results challenge the notion that sharpness regularization alone suffices for non-stationary continual learning and motivate testing on more diverse benchmarks and RL contexts, possibly in combination with other CL strategies.

Abstract

In recent years, continual learning, a prediction setting in which the problem environment may evolve over time, has become an increasingly popular research field due to the framework's gearing towards complex, non-stationary objectives. Learning such objectives requires plasticity, or the ability of a neural network to adapt its predictions to a different task. Recent findings indicate that plasticity loss on new tasks is highly related to loss landscape sharpness in non-stationary RL frameworks. We explore the usage of sharpness regularization techniques, which seek out smooth minima and have been touted for their generalization capabilities in vanilla prediction settings, in efforts to combat plasticity loss. Our findings indicate that such techniques have no significant effect on reducing plasticity loss.

Neural Network Plasticity and Loss Sharpness

TL;DR

Abstract

Paper Structure (15 sections, 6 equations, 5 figures, 2 tables)

This paper contains 15 sections, 6 equations, 5 figures, 2 tables.

Introduction
Previous Work
Flat Minima
Plasticity
Continual Learning
Class-Incremental Learning
Domain-Incremental Learning
Experimental Setup
Data
Classifier
Results
Domain-Incremental Learning
Class-Incremental Learning
Discussion
Conclusion and Future Work

Figures (5)

Figure 1: Blue and green regions represents set of parameters $\theta$ with high performance for two tasks $A$ and $B$, respectively. 1: Model learns parameter $\textbf{x}$ for task $A$ & achieves high accuracy 2a: Following the introduction of $B$, model learns some $\textbf{x}$ that produces high performance on $B$ with poor performance on $A$. 2b: Model maintains performance on $A$ but obtains poor performance on $B$. 2c: Model learns $\textbf{x}$ with good performance on both $A$ and $B$.
Figure 2: Nine class-incremental learning tasks
Figure 3: One domain-incremental learning task
Figure 4: Task-specific accuracies in the domain-incremental learning problem under different loss minimization schema (10 runs)
Figure 5: Task-specific accuracies in the class-incremental learning problem under different loss minimization schema (10 runs)

Neural Network Plasticity and Loss Sharpness

TL;DR

Abstract

Neural Network Plasticity and Loss Sharpness

Authors

TL;DR

Abstract

Table of Contents

Figures (5)