Thermodynamic bounds on energy use in quasi-static Deep Neural Networks
Alexei V. Tkachenko
TL;DR
This work develops a thermodynamic framework by mapping quasi-static analog DNNs to a free-energy functional, showing that inference can, in principle, be thermodynamically reversible with zero energy cost, while training requires finite dissipation that scales with the number of parameters and dataset size. Training emerges from parameter annealing, where stresses induced by clamped inputs/outputs drive backpropagation-like updates without an explicit loss function. The authors derive a universal lower bound, $E_{train} \gtrsim 2 N D k_B T$, and discuss implications for energy-efficient analog computing versus digital hardware bound by Landauer limits. The results provide a principled link between neural computation and thermodynamics, suggesting reversible inference is feasible in analog substrates but learning will incur unavoidable energy costs.
Abstract
The rapid growth of deep neural networks (DNNs) has brought increasing attention to their energy use during training and inference. Here, we establish the thermodynamic bounds on energy consumption in quasi-static analog DNNs by mapping modern feedforward architectures onto a physical free-energy functional. This framework provides a direct statistical-mechanical interpretation of quasi-static DNNs. As a result, inference can proceed in a thermodynamically reversible manner, with vanishing minimal energy cost, in contrast to the Landauer limit that constrains digital hardware. Importantly, inference corresponds to relaxation to a unique free-energy minimum with F_{\min}=0, allowing all constraints to be satisfied without residual stress. By comparison, training overconstrains the system: simultaneous clamping of inputs and outputs generates stresses that propagate backward through the architecture, reproducing the rules of backpropagation. Parameter annealing then relaxes these stresses, providing a purely physical route to learning without an explicit loss function. We further derive a universal lower bound on training energy, E< 2NDkT, which scales with both the number of trainable parameters and the dataset size.
