Noradrenergic-inspired gain modulation attenuates the stability gap in joint training
Alejandro Rodriguez-Garcia, Anindya Ghosh, Srikanth Ramaswamy
TL;DR
This work addresses the stability gap in continual learning, a transient drop in old-task performance at task boundaries that persists even under ideal joint training. It introduces a biologically inspired gain modulation mechanism that implements a two-timescale optimization, using the effective weight $W_{\text{eff}} = g(t)\,W$ and an uncertainty-driven gain dynamic $g(t)$ that transiently boosts updates. The approach yields emergent fast and slow learning components and a forward-pass loss-flattening reparameterization with $\lambda_{\text{eff}} = \lambda / g^2$, which together attenuate stability gaps across domain- and class-incremental benchmarks (MNIST, CIFAR, mini-ImageNet) while maintaining competitive accuracy. The results suggest that neuromodulatory gain dynamics provide a lightweight, optimizer-level intervention compatible with replay and other continual-learning techniques, and they reveal gain as a readout of task complexity. The work lays a foundation for integrating biologically inspired gain control with deep learning to improve reliability in online continual learning scenarios.
Abstract
Recent work in continual learning has highlighted the stability gap -- a temporary performance drop on previously learned tasks when new ones are introduced. This phenomenon reflects a mismatch between rapid adaptation and strong retention at task boundaries, underscoring the need for optimization mechanisms that balance plasticity and stability over abrupt distribution changes. While optimizers such as momentum-SGD and Adam introduce implicit multi-timescale behavior, they still exhibit pronounced stability gaps. Importantly, these gaps persist even under ideal joint training, making it crucial to study them in this setting to isolate their causes from other sources of forgetting. Motivated by how noradrenergic (neuromodulatory) bursts transiently increase neuronal gain under uncertainty, we introduce a dynamic gain scaling mechanism as a two-timescale optimization technique that balances adaptation and retention by modulating effective learning rates and flattening the local landscape through an effective reparameterization. Across domain- and class-incremental MNIST, CIFAR, and mini-ImageNet benchmarks under task-agnostic joint training, dynamic gain scaling effectively attenuates stability gaps while maintaining competitive accuracy, improving robustness at task transitions.
