Neural Network Plasticity and Loss Sharpness
Max Koster, Jude Kukla
TL;DR
This work investigates whether sharpness-based regularization can mitigate plasticity loss in continual learning under non-stationary tasks. By evaluating SAM and Gradient Norm Penalty on permuted MNIST in both domain- and class-incremental settings, the study finds that these techniques do not reduce plasticity loss and can even worsen it in some cases. Across 100 domain tasks and 45 class tasks, SGD generally maintains stability, while SAM often degrades performance and GNP shows potential in class-incremental scenarios but not universally. The results challenge the notion that sharpness regularization alone suffices for non-stationary continual learning and motivate testing on more diverse benchmarks and RL contexts, possibly in combination with other CL strategies.
Abstract
In recent years, continual learning, a prediction setting in which the problem environment may evolve over time, has become an increasingly popular research field due to the framework's gearing towards complex, non-stationary objectives. Learning such objectives requires plasticity, or the ability of a neural network to adapt its predictions to a different task. Recent findings indicate that plasticity loss on new tasks is highly related to loss landscape sharpness in non-stationary RL frameworks. We explore the usage of sharpness regularization techniques, which seek out smooth minima and have been touted for their generalization capabilities in vanilla prediction settings, in efforts to combat plasticity loss. Our findings indicate that such techniques have no significant effect on reducing plasticity loss.
