Minimal-Dissipation Learning for Energy-Based Models
Jeff Hnybida, Simon Verret
TL;DR
This work connects the bias of approximate MLE training for persistent chain EBMs to the thermodynamic excess work, establishing a fundamental energy-efficiency bound for finite-time learning. By analyzing a harmonic-trap EBM, it shows that minimal-dissipation learning is achievable with carefully designed time-dependent learning-rate protocols, including continuous and discontinuous schemes, and that discontinuities enable learning of unknown targets under equilibrium initialization. The authors generalize these ideas to general potentials, deriving a learning-rate matrix that induces a natural gradient flow on the MLE objective, and demonstrating a deep link between stochastic thermodynamics, information geometry, and second-order optimization. The results offer principled guidance for energy-efficient training and illuminate how thermodynamic insights can inform learning-rate design and potential hardware implementations for thermodynamic computing.
Abstract
We show that the bias of the approximate maximum-likelihood estimation (MLE) objective of a persistent chain energy-based model (EBM) is precisely equal to the thermodynamic excess work of an overdamped Langevin dynamical system. We then answer the question of whether such a model can be trained with minimal excess work, that is, energy dissipation, in a finite amount of time. We find that a Gaussian energy function with constant variance can be trained with minimal excess work by controlling only the learning rate. This proves that it is possible to train a persistent chain EBM in a finite amount of time with minimal dissipation and also provides a lower bound on the energy required for the computation. We refer to such a learning process that minimizes the excess work as minimal-dissipation learning. We then provide a generalization of the optimal learning rate schedule to general potentials and find that it induces a natural gradient flow on the MLE objective, a well-known second-order optimization method.
