Continual Learning through Control Minimization

Sander de Haan; Yassine Taoudi-Benchekroun; Pau Vilimelis Aceituno; Benjamin F. Grewe

Continual Learning through Control Minimization

Sander de Haan, Yassine Taoudi-Benchekroun, Pau Vilimelis Aceituno, Benjamin F. Grewe

TL;DR

This work reframes continual learning as a control problem in which learning and prior-task preservation compete within neural dynamics, introducing Equilibrium Fisher Control (EFC). By converting parameter-space regularizers into neuron-specific preservation signals and operating the learning process at equilibrium, EFC induces a continual-natural gradient that implicitly encodes prior-task curvature without storing full Fisher matrices. Theoretical results show that the equilibrium-based preconditioning filters interference from earlier tasks and supports class-incremental convergence, with tighter forgetting bounds than traditional regularization methods. Empirically, EFC recovers curvature-like structure dynamically, matches full-Fisher baselines on forgetting profiles, and achieves strong performance on Split-MNIST, Split-CIFAR10, and Split-Tiny-ImageNet without replay, indicating practical viability and potential biological relevance.

Abstract

Catastrophic forgetting remains a fundamental challenge for neural networks when tasks are trained sequentially. In this work, we reformulate continual learning as a control problem where learning and preservation signals compete within neural activity dynamics. We convert regularization penalties into preservation signals that protect prior-task representations. Learning then proceeds by minimizing the control effort required to integrate new tasks while competing with the preservation of prior tasks. At equilibrium, the neural activities produce weight updates that implicitly encode the full prior-task curvature, a property we term the continual-natural gradient, requiring no explicit curvature storage. Experiments confirm that our learning framework recovers true prior-task curvature and enables task discrimination, outperforming existing methods on standard benchmarks without replay.

Continual Learning through Control Minimization

TL;DR

Abstract

Paper Structure (46 sections, 19 theorems, 116 equations, 2 figures, 1 table)

This paper contains 46 sections, 19 theorems, 116 equations, 2 figures, 1 table.

Introduction
The Equilibrium Fisher Control framework
Preservation signal
Network dynamics
Learning objective
Learning theory
The continual-natural gradient property
Class-incremental convergence
Why parameter-based regularization cannot cancel the interference term
EFC filters the interference term
Forgetting bounds in task-incremental learning
Experiments
Empirical validation of the learning theory
The continual-natural gradient approximates the full Fisher
EFC achieves class-incremental convergence
...and 31 more sections

Key Result

Theorem 3.1

(Informal) For a small learning rate $\eta$ and linearization around $\theta_A^*$, the weight update for a sample $x_B$ satisfies: where $\tilde{F}_A \triangleq F_A|_{x_B}$ is an implicit approximation of the full Fisher information matrix $F_A$ that emerges from the network dynamics when processing sample $x_B$. An explicit form is derived in Supplementary supp:continual-natural.

Figures (2)

Figure 1: Empirical validation of the learning theory. (a) Task A loss increase as a function of training on Task B, the dynamical approximation of the full Fisher $\tilde{F}$ follows the analytical full Fisher $F$ (left). The Task A accuracy decrease is less for the EFC method than the full Fisher (right). (b) Class-IL loss (left) and accuracy (left) do not converge for any backpropagation-based regularization method and do converge for the EFC method.
Figure S1: Confusion matrices on class-incremental Split-MNIST for online EWC schwarz2018 versus Dark Experience Replay++ buzzega2020 versus Equilibrium Fisher Control. For every matrix we plot the true label versus the predicted label. Values are measured in percentage of confusion.

Theorems & Definitions (23)

Theorem 3.1
Theorem 3.2
Theorem 3.3
Theorem 1.1: First-order gradient
Theorem 1.2: General Class of Controllers for Multiplicative LCP
Theorem 1.3: Convergence and equilibrium points
Corollary 1.4: Special case: $\beta = 0$
Corollary 1.5: Special case: $\beta \neq 0$
Theorem 2.1: Steady-state approximation for multiplicative LCP
proof
...and 13 more

Continual Learning through Control Minimization

TL;DR

Abstract

Continual Learning through Control Minimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (23)