Table of Contents
Fetching ...

Non-Equilibrium Stochastic Dynamics as a Unified Framework for Insight and Repetitive Learning: A Kramers Escape Approach to Continual Learning

Gunn Kim

Abstract

Continual learning in artificial neural networks is fundamentally limited by the stability--plasticity dilemma: systems that retain prior knowledge tend to resist acquiring new knowledge, and vice versa. Existing approaches, most notably elastic weight consolidation~(EWC), address this empirically without a physical account of why plasticity eventually collapses as tasks accumulate. Separately, the distinction between sudden insight and gradual skill acquisition through repetitive practice has lacked a unified theoretical description. Here, we show that both problems admit a common resolution within non-equilibrium statistical physics. We model the state of a learning system as a particle evolving under Langevin dynamics on a double-well energy landscape, with the noise amplitude governed by a time-dependent effective temperature $T(t)$. The probability density obeys a Fokker--Planck equation, and transitions between metastable states are governed by the Kramers escape rate $k = (ω_0ω_b/2π)\,e^{-ΔE/T}$. We make two contributions. First, we identify the EWC penalty term as an energy barrier whose height grows linearly with the number of accumulated tasks, yielding an exponential collapse of the transition rate predicted analytically and confirmed numerically. Second, we show that insight and repetitive learning correspond to two qualitatively distinct temperature protocols within the same Fokker--Planck equation: insight events produce transient spikes in $T(t)$ that drive rapid barrier crossing, whereas repetitive practice operates at a modestly elevated but fixed temperature, achieving transitions through sustained stochastic diffusion. These results establish a physically grounded framework for understanding plasticity and its failure in continual learning systems, and suggest principled design criteria for adaptive noise schedules in artificial intelligence.

Non-Equilibrium Stochastic Dynamics as a Unified Framework for Insight and Repetitive Learning: A Kramers Escape Approach to Continual Learning

Abstract

Continual learning in artificial neural networks is fundamentally limited by the stability--plasticity dilemma: systems that retain prior knowledge tend to resist acquiring new knowledge, and vice versa. Existing approaches, most notably elastic weight consolidation~(EWC), address this empirically without a physical account of why plasticity eventually collapses as tasks accumulate. Separately, the distinction between sudden insight and gradual skill acquisition through repetitive practice has lacked a unified theoretical description. Here, we show that both problems admit a common resolution within non-equilibrium statistical physics. We model the state of a learning system as a particle evolving under Langevin dynamics on a double-well energy landscape, with the noise amplitude governed by a time-dependent effective temperature . The probability density obeys a Fokker--Planck equation, and transitions between metastable states are governed by the Kramers escape rate . We make two contributions. First, we identify the EWC penalty term as an energy barrier whose height grows linearly with the number of accumulated tasks, yielding an exponential collapse of the transition rate predicted analytically and confirmed numerically. Second, we show that insight and repetitive learning correspond to two qualitatively distinct temperature protocols within the same Fokker--Planck equation: insight events produce transient spikes in that drive rapid barrier crossing, whereas repetitive practice operates at a modestly elevated but fixed temperature, achieving transitions through sustained stochastic diffusion. These results establish a physically grounded framework for understanding plasticity and its failure in continual learning systems, and suggest principled design criteria for adaptive noise schedules in artificial intelligence.

Paper Structure

This paper contains 25 sections, 23 equations, 4 figures.

Figures (4)

  • Figure 1: Model setup. (a) Double-well energy landscape $E(s)=(s^2-1)^2$ with the three temperature levels indicated. The barrier height is $\Delta E=1$, with minima at $s=\pm1$ and barrier at $s=0$. (b) Schematic of the three temperature protocols: fixed $T_0=0.22$ (EWC-like, blue dashed), repetitive training at elevated fixed $T_R=0.32$ (green dash-dot), and adaptive $T(t)$ with transient insight spikes to $T_{\rm kick}=0.95$ (red solid). The equally spaced spikes are illustrative; in the simulation, spikes occur periodically with interval $\Delta t_{\rm kick}=50\,\rm s$ as a simplified proxy for event-driven triggers such as prediction error or novelty signals.
  • Figure 2: Simulation results for the three protocols ($T_0=0.22$, $T_R=0.32$, $T_{\rm kick}=0.95$; $N=6\times10^5$ steps, $\Delta t=10^{-3}$). (a)--(c) Representative trajectories showing zero transitions under fixed $T_0$, frequent transitions under adaptive $T(t)$, and occasional transitions under repetitive $T_R$. The number of well-to-well transitions is shown in each panel. (d) Time series of $T(t)$ for the adaptive protocol; vertical lines mark transition events. (e) Steady-state probability density $\rho(s)$ for all three protocols, compared to the Boltzmann distribution at $T_0$ (black dotted). The adaptive protocol produces a near-symmetric bimodal distribution; the fixed protocol remains unimodal. The shaded region indicates the density gain of the adaptive protocol relative to the fixed protocol.
  • Figure 3: Quantitative validation against Kramers theory. (a) Transition rates measured in simulation (well-to-well criterion, $|s|>0.7$) across a sweep of fixed temperatures on a logarithmic $y$-axis, compared to the Kramers curve $k=(\omega_0\omega_b/2\pi)\,e^{-\Delta E/T}$ (solid black line). The linear relationship between $\log k$ and $1/T$ (Arrhenius form) is clearly visible. Gray circles: fixed-$T$ sweep. Colored markers: the three operating points (fixed $T_0$, blue square; adaptive $\langle T\rangle$, red triangle; repetitive $T_R$, green diamond). (b) Transition rates for the three protocols; dashed lines indicate Kramers predictions.
  • Figure 4: EWC plasticity collapse. (a) Transition rate as a function of the number of accumulated tasks $n$ for EWC (theory, blue dashed line; simulation, open squares) and the adaptive $T(t)$ protocol (red triangles). The EWC rate collapses exponentially as predicted by Eq. \ref{['eq:ewc_collapse']}, while the adaptive protocol maintains a constant rate. The repetitive protocol (green diamonds) provides a constant but uncompensated reference. (b) Barrier height $\Delta E(n)$ (black squares, left axis) and the temperature $T(n)$ required to maintain a constant transition rate (red triangles, right axis) as functions of $n$.