Table of Contents
Fetching ...

Noradrenergic-inspired gain modulation attenuates the stability gap in joint training

Alejandro Rodriguez-Garcia, Anindya Ghosh, Srikanth Ramaswamy

TL;DR

This work addresses the stability gap in continual learning, a transient drop in old-task performance at task boundaries that persists even under ideal joint training. It introduces a biologically inspired gain modulation mechanism that implements a two-timescale optimization, using the effective weight $W_{\text{eff}} = g(t)\,W$ and an uncertainty-driven gain dynamic $g(t)$ that transiently boosts updates. The approach yields emergent fast and slow learning components and a forward-pass loss-flattening reparameterization with $\lambda_{\text{eff}} = \lambda / g^2$, which together attenuate stability gaps across domain- and class-incremental benchmarks (MNIST, CIFAR, mini-ImageNet) while maintaining competitive accuracy. The results suggest that neuromodulatory gain dynamics provide a lightweight, optimizer-level intervention compatible with replay and other continual-learning techniques, and they reveal gain as a readout of task complexity. The work lays a foundation for integrating biologically inspired gain control with deep learning to improve reliability in online continual learning scenarios.

Abstract

Recent work in continual learning has highlighted the stability gap -- a temporary performance drop on previously learned tasks when new ones are introduced. This phenomenon reflects a mismatch between rapid adaptation and strong retention at task boundaries, underscoring the need for optimization mechanisms that balance plasticity and stability over abrupt distribution changes. While optimizers such as momentum-SGD and Adam introduce implicit multi-timescale behavior, they still exhibit pronounced stability gaps. Importantly, these gaps persist even under ideal joint training, making it crucial to study them in this setting to isolate their causes from other sources of forgetting. Motivated by how noradrenergic (neuromodulatory) bursts transiently increase neuronal gain under uncertainty, we introduce a dynamic gain scaling mechanism as a two-timescale optimization technique that balances adaptation and retention by modulating effective learning rates and flattening the local landscape through an effective reparameterization. Across domain- and class-incremental MNIST, CIFAR, and mini-ImageNet benchmarks under task-agnostic joint training, dynamic gain scaling effectively attenuates stability gaps while maintaining competitive accuracy, improving robustness at task transitions.

Noradrenergic-inspired gain modulation attenuates the stability gap in joint training

TL;DR

This work addresses the stability gap in continual learning, a transient drop in old-task performance at task boundaries that persists even under ideal joint training. It introduces a biologically inspired gain modulation mechanism that implements a two-timescale optimization, using the effective weight and an uncertainty-driven gain dynamic that transiently boosts updates. The approach yields emergent fast and slow learning components and a forward-pass loss-flattening reparameterization with , which together attenuate stability gaps across domain- and class-incremental benchmarks (MNIST, CIFAR, mini-ImageNet) while maintaining competitive accuracy. The results suggest that neuromodulatory gain dynamics provide a lightweight, optimizer-level intervention compatible with replay and other continual-learning techniques, and they reveal gain as a readout of task complexity. The work lays a foundation for integrating biologically inspired gain control with deep learning to improve reliability in online continual learning scenarios.

Abstract

Recent work in continual learning has highlighted the stability gap -- a temporary performance drop on previously learned tasks when new ones are introduced. This phenomenon reflects a mismatch between rapid adaptation and strong retention at task boundaries, underscoring the need for optimization mechanisms that balance plasticity and stability over abrupt distribution changes. While optimizers such as momentum-SGD and Adam introduce implicit multi-timescale behavior, they still exhibit pronounced stability gaps. Importantly, these gaps persist even under ideal joint training, making it crucial to study them in this setting to isolate their causes from other sources of forgetting. Motivated by how noradrenergic (neuromodulatory) bursts transiently increase neuronal gain under uncertainty, we introduce a dynamic gain scaling mechanism as a two-timescale optimization technique that balances adaptation and retention by modulating effective learning rates and flattening the local landscape through an effective reparameterization. Across domain- and class-incremental MNIST, CIFAR, and mini-ImageNet benchmarks under task-agnostic joint training, dynamic gain scaling effectively attenuates stability gaps while maintaining competitive accuracy, improving robustness at task transitions.

Paper Structure

This paper contains 40 sections, 21 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: Schematic of the stability gap.a. Conceptual illustration highlighting the distinction between the stability gap and catastrophic forgetting. While catastrophic forgetting arises from an inaccurate approximation of the joint loss, the stability gap emerges from a deviation of the optimization trajectory from the path of non- or minimally-increasing loss, leading to a transient drop in old-task performance. b. Old-task accuracy during sequential learning on Split CIFAR-10 under stochastic gradient descent with 0.9 momentum. The sharp performance drop at the task boundary quantifies the stability gap that arises from the deviated trayectory.
  • Figure 2: Simple proof of principle of gain boost approximating a two-timescale optimizer.A) Schematic of gain-induced effective weight decomposition under noradrenergic neuromodulation. Phasic noradrenergic signals transiently increase neuronal gain, thereby splitting the effective weight into a fast–slow scheme with $w_{\text{fast}} = \left[g(t) - g_{0}\right] w(t)$ and $w_{\text{slow}} = g_{0} \, w(t)$. B)Left panel. Simplified model $y(t) = W_{\mathrm{eff}}(t)\,x$ with constant input $x$ predicting a target $T(t)$ under mean square error loss $L = \tfrac{1}{2}[T(t)-y(t)]^2$. Comparison of gain-modulated (orange) and fast–slow (blue) and slow (purple) weight methods. The unshaded region corresponds to optimization toward target $T_1$, and the shaded region to optimization toward target $T_2$. The dashed vertical line marks the time of peak neuronal gain. Right panel. Loss landscape flattening induced by gain neuromodulation at the highest neuronal gain. In the effective weight space, gain boosts reparametrize the loss, effectively reducing its curvature, $\lambda \rightarrow \lambda/g^2$. Dots illustrate each method's state at the peak gain time, and arrows-length indicate the gradient step.
  • Figure 3: Stability gaps under class-incremental learning. The left column shows results on Split MNIST, the center one on Split CIFAR-10, and the right column on Split mini-ImageNet. The top panels display the test accuracy on the first task as the model is incrementally trained on all benchmark tasks. The middle panels show the corresponding test loss, and the bottom panels depict the evolution of neuronal gain across training iterations. Curves represent the mean $\pm$ standard error (shaded area) over five runs with different random seeds. Dots illustrate the min-ACC per task. Color coding denotes the different optimizers: our method (NGM-SGD, Algorithm \ref{['alg:ngm_sgd']}) in orange, momentum-SGD (MSGD) in blue, Adam in green, and vanilla SGD in gray. Plots have been zoomed in to better highlight the stability gaps. As a result, some performance drops may appear truncated.
  • Figure 4: Stability gaps under domain-incremental learning. The left column shows results on Rotated MNIST with $80^\circ$ rotations, and the right column on Domain CIFAR-100. The top panels display the test accuracy on the first task as the model is incrementally trained on all benchmark tasks. The middle panels show the corresponding test loss, and the bottom panels depict the evolution of neuronal gain across training iterations. Curves represent the mean $\pm$ standard error (shaded area) over five runs with different random seeds. Dots illustrate the min-ACC per task. Color coding denotes the different optimizers: our method (NGM-SGD, Algorithm \ref{['alg:ngm_sgd']}) in orange, momentum-SGD (MSGD) in blue, Adam in green, and vanilla SGD in gray. Plots have been zoomed in to better highlight the stability gaps. As a result, some performance drops may appear truncated.
  • Figure 5: Effect of ablating gain modulation on Split MNIST. Test accuracy (top left), task-1 accuracy (top right), and neuronal gain dynamics (bottom) are shown for NGM-SGD (orange) and its ablation G0-SGD (purple).
  • ...and 4 more figures