Fluctuation-dissipation relations for stochastic gradient descent

Sho Yaida

Fluctuation-dissipation relations for stochastic gradient descent

Sho Yaida

TL;DR

Problem: relate minibatch noise in SGD to parameter dynamics during stationary training. Approach: derive exact, stationarity-based fluctuation-dissipation relations using a discrete-time master-equation framework that accommodates non-Gaussian noise and nonconvex landscapes. Key findings: FDR1 provides a practical equilibration metric and adaptive learning-rate schedule; FDR2 enables probing the loss landscape via the Hessian and anharmonicity. Empirical validation on MNIST and CIFAR-10 confirms the relations and demonstrates the practical utility of adaptive scheduling.

Abstract

The notion of the stationary equilibrium ensemble has played a central role in statistical mechanics. In machine learning as well, training serves as generalized equilibration that drives the probability distribution of model parameters toward stationarity. Here, we derive stationary fluctuation-dissipation relations that link measurable quantities and hyperparameters in the stochastic gradient descent algorithm. These relations hold exactly for any stationary state and can in particular be used to adaptively set training schedule. We can further use the relations to efficiently extract information pertaining to a loss-function landscape such as the magnitudes of its Hessian and anharmonicity. Our claims are empirically verified.

Fluctuation-dissipation relations for stochastic gradient descent

TL;DR

Abstract

Fluctuation-dissipation relations for stochastic gradient descent

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)