Table of Contents
Fetching ...

Control of Overfitting with Physics

Sergei V. Kozyrev, Ilya A Lopatin, Alexander N Pechen

TL;DR

This work connects overfitting control in machine learning to physics and biology by linking stochastic gradient Langevin dynamics to the Eyring formula and by mapping GAN dynamics to a predator–prey system. It argues that learning preferentially occupies wide, low free-energy minima, with temperature tuning guiding exploration versus exploitation, and that coupling discriminator–generator dynamics further biases toward broad likelihood maxima. The paper introduces a branching random-process extension to model populations of discriminators and generators, and it validates the ideas through simulations on multi-well objectives and a Wine dataset, illustrating reduced overfitting and improved generalization. Together, these analogies provide a theoretical lens for understanding generalization and suggest practical mechanisms for improving stability in SGLD and GAN training.

Abstract

While there are many works on the applications of machine learning, not so many of them are trying to understand the theoretical justifications to explain their efficiency. In this work, overfitting control (or generalization property) in machine learning is explained using analogies from physics and biology. For stochastic gradient Langevin dynamics, we show that the Eyring formula of kinetic theory allows to control overfitting in the algorithmic stability approach - when wide minima of the risk function with low free energy correspond to low overfitting. For the generative adversarial network (GAN) model, we establish an analogy between GAN and the predator-prey model in biology. An application of this analogy allows us to explain the selection of wide likelihood maxima and overfitting reduction for GANs.

Control of Overfitting with Physics

TL;DR

This work connects overfitting control in machine learning to physics and biology by linking stochastic gradient Langevin dynamics to the Eyring formula and by mapping GAN dynamics to a predator–prey system. It argues that learning preferentially occupies wide, low free-energy minima, with temperature tuning guiding exploration versus exploitation, and that coupling discriminator–generator dynamics further biases toward broad likelihood maxima. The paper introduces a branching random-process extension to model populations of discriminators and generators, and it validates the ideas through simulations on multi-well objectives and a Wine dataset, illustrating reduced overfitting and improved generalization. Together, these analogies provide a theoretical lens for understanding generalization and suggest practical mechanisms for improving stability in SGLD and GAN training.

Abstract

While there are many works on the applications of machine learning, not so many of them are trying to understand the theoretical justifications to explain their efficiency. In this work, overfitting control (or generalization property) in machine learning is explained using analogies from physics and biology. For stochastic gradient Langevin dynamics, we show that the Eyring formula of kinetic theory allows to control overfitting in the algorithmic stability approach - when wide minima of the risk function with low free energy correspond to low overfitting. For the generative adversarial network (GAN) model, we establish an analogy between GAN and the predator-prey model in biology. An application of this analogy allows us to explain the selection of wide likelihood maxima and overfitting reduction for GANs.

Paper Structure

This paper contains 15 sections, 32 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Thermal plot of the function $\mathcal{L}$ (left) and its gradient field (right). Red dot in the center of the gradient field plot shows the starting point $(0,0)$.
  • Figure 2: Fraction of the runs of the SGLD starting at the point $x_0$, which converge to the well $c_1$ with greater width.
  • Figure 3: Fraction of the points which converge to extrema vs. iteration number plotted for several inverse temperatures $\beta=0, 0.75, 1.5, 2.25, 3.0$.
  • Figure 4: Absolute value of the norm of the vector function $\| V(d) \|$ and its characteristic points: $l$ and $A$.
  • Figure 5: The limiting unstable oscillations around the extremum point for the predator--prey model.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Example