On the Temperature of Machine Learning Systems

Dong Zhang

On the Temperature of Machine Learning Systems

Dong Zhang

TL;DR

This work constructs a thermodynamic lens for machine learning by defining energy and entropy analogues within ML systems and introducing temperature as a principled diagnostic for training complexity and data distribution changes. It centers on two ML states, Type I (parameter-initialization phase) and Type II (data-shift evolution), and models training as an isothermal phase transition with temperature computable from energy-entropy changes. The paper derives analytical and asymptotic temperature expressions for linear models under different parameter initializations (normal, uniform, mixed) and loss forms (MSE, MAE, Cross Entropy), then extends the framework to neural networks, showing global and per-layer temperatures and a heat-engine interpretation with work efficiency dependent on activation functions. The resulting temperature characterizations reveal how data geometry, initialization, and architecture interact to shape training dynamics, offering a principled, first-principles perspective that complements empirical ML practice. This framework provides a foundation for comparing architectures and understanding retraining under data shifts through thermodynamic quantities like $E$, $S$, and $T$, with potential implications for model selection, continual learning, and data-centric ML strategies.

Abstract

We develop a thermodynamic theory for machine learning (ML) systems. Similar to physical thermodynamic systems which are characterized by energy and entropy, ML systems possess these characteristics as well. This comparison inspire us to integrate the concept of temperature into ML systems grounded in the fundamental principles of thermodynamics, and establish a basic thermodynamic framework for machine learning systems with non-Boltzmann distributions. We introduce the concept of states within a ML system, identify two typical types of state, and interpret model training and refresh as a process of state phase transition. We consider that the initial potential energy of a ML system is described by the model's loss functions, and the energy adheres to the principle of minimum potential energy. For a variety of energy forms and parameter initialization methods, we derive the temperature of systems during the phase transition both analytically and asymptotically, highlighting temperature as a vital indicator of system data distribution and ML training complexity. Moreover, we perceive deep neural networks as complex heat engines with both global temperature and local temperatures in each layer. The concept of work efficiency is introduced within neural networks, which mainly depends on the neural activation functions. We then classify neural networks based on their work efficiency, and describe neural networks as two types of heat engines.

On the Temperature of Machine Learning Systems

TL;DR

, and

, with potential implications for model selection, continual learning, and data-centric ML strategies.

Abstract

Paper Structure (51 sections, 174 equations, 8 figures, 3 tables)

This paper contains 51 sections, 174 equations, 8 figures, 3 tables.

Introduction
General Theory
State of a Machine Learning system
Type I State and Phase Transition.
Type II State.
System Energy
System Entropy
Discrete Entropy and Differential Entropy.
Parameter Entropy and Data Entropy.
Non-Boltzmann Distribution
ML vs Physics Perspective, and the Following Content
Linear Regression with MSE
Parameter Initialization: Normal Distribution
2D Linear Regression
High-dimensional Linear Regression
...and 36 more sections

Figures (8)

Figure 1: A machine learning (ML) system includes the initial model design, setting of initial parameters, importing data for training, importing new data for prediction tasks, and the process of keeping the model refreshed with new data. From a physics perspective, such a system with a series of steps is analogous to a heat engine. We can examine the various temperatures of the system during these processes, as well as the changes in energy and entropy.
Figure 2: Three fundamental elements of a basic machine learning (ML) system.
Figure 3: Type I state of a ML system (State I), which is the state for a given dataset with a set of parameters ${\pmb \mu}$. This state represents a ML system that has not yet been trained, and each $\{E_i, \pmb \mu_i\}$ can be considered as a particle. After training, all the particles converge to $\{\hat{E}, \hat{\pmb \mu}\}$, which is State II. The transition from State I to State II is the process of the ML system going from initial state to trained state. Meanwhile, it can also be viewed as a phase transition process from State I to State II. We can use an isothermal phase transition process to calculate the temperature of the system.
Figure 4: The evolution of the Type II state of a ML system (discrete case). The training dataset of the system is constantly evolving from $\mathcal{D}_1 = \mathcal{X}_1 \times \mathcal{Y}_1$ to $\mathcal{D}_2 = \mathcal{X}_2 \times \mathcal{Y}_2$ to $\mathcal{D}_3$..., forming a sequence ${\mathcal{D}_k}$, and the fixed parameter for the model ${\pmb \mu_k}$ and $\mathcal{D}_k$ give a Type II state with energy $\hat{E}(\mathcal{D}_k)$, and the entropy corresponding to the training dataset can be calculated using Equation (\ref{['equ_stateII_entropy']}). As the Type II state evolves, we can calculate the temperature of the state and its evolution through the changes in energy and entropy.
Figure 5: Energy of the system in $\mathcal{Y}$ space. The left figure shows that the system has "long-range energy", that is, there is energy $\mathscr{V}(y_i, y_j)$ between any internal particles $y_i$ and $y_j$ within the system. Also, there is energy $\mathscr{V}(\hat{y}_k,y_i)$ between any "position" $\hat{y_k}$ and any internal particle $y_i$. See Equation (\ref{['equ_energy_1']}) to describe the energy of such as system. The right figure shows a simplified energy model where the system has "short-range energy". In this model, any position within the system $\hat{y}_k$ interacts only with its nearest internal particle $y_k$. The mutual energy between internal particles is ignored due to its lack of variation. See Equation (\ref{['equ_energy_2']}) to describe such a system.
...and 3 more figures

On the Temperature of Machine Learning Systems

TL;DR

Abstract

On the Temperature of Machine Learning Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (8)