Stochastic Thermodynamics of Learning Parametric Probabilistic Models
Shervin Sadat Parsi
TL;DR
This work reframes learning of Parametric Probabilistic Models as a thermodynamic process, introducing Memorized Information (M-info) and Learned Information (L-info) to quantify information stored in parameters and task-aligned learning, respectively. By modeling the joint dynamics of model outputs X and parameters Θ with lagged bipartite dynamics and Local Detailed Balance, it links information flow to entropy production and identifies Θ as a high-capacity heat reservoir that stores learned information through the learned-data exchange. Using the Detailed Fluctuation Theorem, the authors connect L-info to interval entropy production, and describe an ideal, quasi-static learning regime where all memorized information is relevant and conditional entropy production vanishes, at the cost of increased computation. The framework provides a thermodynamic explanation for why over-parameterization and slow, lazy dynamics can aid generalization, while offering a principled path to diagnose and quantify information flow and energy exchange during training under both naive and more realistic reservoir models.
Abstract
We have formulated a family of machine learning problems as the time evolution of Parametric Probabilistic Models (PPMs), inherently rendering a thermodynamic process. Our primary motivation is to leverage the rich toolbox of thermodynamics of information to assess the information-theoretic content of learning a probabilistic model. We first introduce two information-theoretic metrics: Memorized-information (M-info) and Learned-information (L-info), which trace the flow of information during the learning process of PPMs. Then, we demonstrate that the accumulation of L-info during the learning process is associated with entropy production, and parameters serve as a heat reservoir in this process, capturing learned information in the form of M-info.
