Table of Contents
Fetching ...

A Thermodynamic Theory of Learning Part II: Critical Period Closure and Continual Learning Failure

Daisuke Okanohara

TL;DR

This work develops a thermodynamic view of continual learning, treating learning as a finite-time transport in parameter distributions and showing that irreversibility, quantified by entropy production, constrains not only final solutions but the entire learning trajectory. Building on the Epistemic Speed Limit, the paper introduces a trajectory-level analysis with a free-energy Lyapunov framework and a geometric notion of task-preserving entropy, revealing how excess dissipation selectively erodes task-equivariant degrees of freedom and leads to critical period closure. It formalizes a dissipation-driven reachability obstruction: after a reference task, a bounded dissipation budget restricts future learning to a Wasserstein ball around the initial distribution, making some new-task regions dynamically inaccessible and causing forgetting if those regions lie outside the reachable set. The results reinterpret catastrophic forgetting as an irreversible dynamical constraint under finite-time learning, offering principles for preserving task-equivalent directions and guiding future algorithm design within non-equilibrium thermodynamic limits.

Abstract

Learning performed over finite time is necessarily irreversible. In Part~I of this series, we modeled learning as a transport process in the space of parameter distributions and derived the Epistemic Speed Limit, which lower-bounds entropy production under finite-time learning. In this work (Part~II), we study the consequences of this irreversibility for continual learning from a trajectory-level perspective. We show that finite dissipation constrains not only which solutions are reachable, but which learning paths remain dynamically accessible. Although a continuum of task-equivalent realizations can achieve identical task performance, finite-time learning irreversibly selects among these realizations. This selection occurs through the progressive elimination of degrees of freedom that would otherwise enable structural reconfiguration. We refer to this phenomenon as \emph{critical period closure}: beyond a certain stage of learning, transitions between compatible representations become dynamically inaccessible under any finite dissipation budget. As a result, continual learning failure arises not from the absence of solutions satisfying multiple tasks, but from an irreversible loss of representational freedom induced by prior learning. This reframes catastrophic forgetting as a dynamical constraint imposed by finite-time dissipation, rather than direct task interference.

A Thermodynamic Theory of Learning Part II: Critical Period Closure and Continual Learning Failure

TL;DR

This work develops a thermodynamic view of continual learning, treating learning as a finite-time transport in parameter distributions and showing that irreversibility, quantified by entropy production, constrains not only final solutions but the entire learning trajectory. Building on the Epistemic Speed Limit, the paper introduces a trajectory-level analysis with a free-energy Lyapunov framework and a geometric notion of task-preserving entropy, revealing how excess dissipation selectively erodes task-equivariant degrees of freedom and leads to critical period closure. It formalizes a dissipation-driven reachability obstruction: after a reference task, a bounded dissipation budget restricts future learning to a Wasserstein ball around the initial distribution, making some new-task regions dynamically inaccessible and causing forgetting if those regions lie outside the reachable set. The results reinterpret catastrophic forgetting as an irreversible dynamical constraint under finite-time learning, offering principles for preserving task-equivalent directions and guiding future algorithm design within non-equilibrium thermodynamic limits.

Abstract

Learning performed over finite time is necessarily irreversible. In Part~I of this series, we modeled learning as a transport process in the space of parameter distributions and derived the Epistemic Speed Limit, which lower-bounds entropy production under finite-time learning. In this work (Part~II), we study the consequences of this irreversibility for continual learning from a trajectory-level perspective. We show that finite dissipation constrains not only which solutions are reachable, but which learning paths remain dynamically accessible. Although a continuum of task-equivalent realizations can achieve identical task performance, finite-time learning irreversibly selects among these realizations. This selection occurs through the progressive elimination of degrees of freedom that would otherwise enable structural reconfiguration. We refer to this phenomenon as \emph{critical period closure}: beyond a certain stage of learning, transitions between compatible representations become dynamically inaccessible under any finite dissipation budget. As a result, continual learning failure arises not from the absence of solutions satisfying multiple tasks, but from an irreversible loss of representational freedom induced by prior learning. This reframes catastrophic forgetting as a dynamical constraint imposed by finite-time dissipation, rather than direct task interference.
Paper Structure (28 sections, 3 theorems, 14 equations, 2 figures)

This paper contains 28 sections, 3 theorems, 14 equations, 2 figures.

Key Result

Proposition 1

Assume that finite-time dissipation induces a uniform contraction rate $\lambda>0$ along a $k$-dimensional subspace $\mathcal{U}$, in the sense that the Jacobian of the learning flow satisfies Here, $\lambda$ quantifies the minimal rate of dissipation-induced contraction. Then the differential entropy satisfies and any covariance-based proxy for effective support volume along $\mathcal{U}$ decre

Figures (2)

  • Figure 1: Support loss under finite-time SGD compared to an optimal-transport baseline toward the Gibbs target. (Left) Evolution of the parameter distribution in a curved potential that is flat along the $y$-direction at the optimum. Red ellipses show snapshots of stochastic gradient descent (SGD), which exhibits an early contraction along the nominally irrelevant direction due to transient coupling with the stiff direction. Blue ellipses depict the Wasserstein--2 optimal-transport (OT) geodesic connecting the same initial distribution to the ideal Gibbs distribution associated with the potential (green), approximated by a Gaussian fit. The OT trajectory is not a learning dynamics but a minimal-dissipation reference, illustrating how much support could in principle be preserved when transporting the distribution to the same target. Light gray contours indicate level sets of the potential, and the black line marks the valley center $y=y^\ast$. (Right) Time evolution of the covariance determinant and directional variances. While both trajectories converge toward the same Gibbs target distribution, SGD loses volume and collapses along the flat direction significantly earlier than the OT baseline. This premature support loss indicates excess dissipation induced by finite-time gradient-flow dynamics, rather than a property of the target distribution itself.
  • Figure 2: Irreversibility of entropy loss under evolving local coordinates. (a) Initially, the parameter distribution occupies a broad effective support. (b) Finite-time learning with excess dissipation induces anisotropic contraction, collapsing the distribution along certain directions. (c) Subsequent dynamics may rotate or deform locally task-preserving directions, but the lost volume does not reappear without additional entropy production.

Theorems & Definitions (7)

  • Proposition 1: Dissipation-induced contraction implies entropy collapse
  • proof : Proof sketch
  • Proposition 2: Reachability obstruction
  • proof : Proof sketch
  • Proposition 3: Lower bound on forgetting
  • proof : Proof sketch
  • Definition 1: Trajectory-level compatibility