Table of Contents
Fetching ...

Continual Learning through Control Minimization

Sander de Haan, Yassine Taoudi-Benchekroun, Pau Vilimelis Aceituno, Benjamin F. Grewe

TL;DR

This work reframes continual learning as a control problem in which learning and prior-task preservation compete within neural dynamics, introducing Equilibrium Fisher Control (EFC). By converting parameter-space regularizers into neuron-specific preservation signals and operating the learning process at equilibrium, EFC induces a continual-natural gradient that implicitly encodes prior-task curvature without storing full Fisher matrices. Theoretical results show that the equilibrium-based preconditioning filters interference from earlier tasks and supports class-incremental convergence, with tighter forgetting bounds than traditional regularization methods. Empirically, EFC recovers curvature-like structure dynamically, matches full-Fisher baselines on forgetting profiles, and achieves strong performance on Split-MNIST, Split-CIFAR10, and Split-Tiny-ImageNet without replay, indicating practical viability and potential biological relevance.

Abstract

Catastrophic forgetting remains a fundamental challenge for neural networks when tasks are trained sequentially. In this work, we reformulate continual learning as a control problem where learning and preservation signals compete within neural activity dynamics. We convert regularization penalties into preservation signals that protect prior-task representations. Learning then proceeds by minimizing the control effort required to integrate new tasks while competing with the preservation of prior tasks. At equilibrium, the neural activities produce weight updates that implicitly encode the full prior-task curvature, a property we term the continual-natural gradient, requiring no explicit curvature storage. Experiments confirm that our learning framework recovers true prior-task curvature and enables task discrimination, outperforming existing methods on standard benchmarks without replay.

Continual Learning through Control Minimization

TL;DR

This work reframes continual learning as a control problem in which learning and prior-task preservation compete within neural dynamics, introducing Equilibrium Fisher Control (EFC). By converting parameter-space regularizers into neuron-specific preservation signals and operating the learning process at equilibrium, EFC induces a continual-natural gradient that implicitly encodes prior-task curvature without storing full Fisher matrices. Theoretical results show that the equilibrium-based preconditioning filters interference from earlier tasks and supports class-incremental convergence, with tighter forgetting bounds than traditional regularization methods. Empirically, EFC recovers curvature-like structure dynamically, matches full-Fisher baselines on forgetting profiles, and achieves strong performance on Split-MNIST, Split-CIFAR10, and Split-Tiny-ImageNet without replay, indicating practical viability and potential biological relevance.

Abstract

Catastrophic forgetting remains a fundamental challenge for neural networks when tasks are trained sequentially. In this work, we reformulate continual learning as a control problem where learning and preservation signals compete within neural activity dynamics. We convert regularization penalties into preservation signals that protect prior-task representations. Learning then proceeds by minimizing the control effort required to integrate new tasks while competing with the preservation of prior tasks. At equilibrium, the neural activities produce weight updates that implicitly encode the full prior-task curvature, a property we term the continual-natural gradient, requiring no explicit curvature storage. Experiments confirm that our learning framework recovers true prior-task curvature and enables task discrimination, outperforming existing methods on standard benchmarks without replay.
Paper Structure (46 sections, 19 theorems, 116 equations, 2 figures, 1 table)

This paper contains 46 sections, 19 theorems, 116 equations, 2 figures, 1 table.

Key Result

Theorem 3.1

(Informal) For a small learning rate $\eta$ and linearization around $\theta_A^*$, the weight update for a sample $x_B$ satisfies: where $\tilde{F}_A \triangleq F_A|_{x_B}$ is an implicit approximation of the full Fisher information matrix $F_A$ that emerges from the network dynamics when processing sample $x_B$. An explicit form is derived in Supplementary supp:continual-natural.

Figures (2)

  • Figure 1: Empirical validation of the learning theory. (a) Task A loss increase as a function of training on Task B, the dynamical approximation of the full Fisher $\tilde{F}$ follows the analytical full Fisher $F$ (left). The Task A accuracy decrease is less for the EFC method than the full Fisher (right). (b) Class-IL loss (left) and accuracy (left) do not converge for any backpropagation-based regularization method and do converge for the EFC method.
  • Figure S1: Confusion matrices on class-incremental Split-MNIST for online EWC schwarz2018 versus Dark Experience Replay++ buzzega2020 versus Equilibrium Fisher Control. For every matrix we plot the true label versus the predicted label. Values are measured in percentage of confusion.

Theorems & Definitions (23)

  • Theorem 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Theorem 1.1: First-order gradient
  • Theorem 1.2: General Class of Controllers for Multiplicative LCP
  • Theorem 1.3: Convergence and equilibrium points
  • Corollary 1.4: Special case: $\beta = 0$
  • Corollary 1.5: Special case: $\beta \neq 0$
  • Theorem 2.1: Steady-state approximation for multiplicative LCP
  • proof
  • ...and 13 more