Table of Contents
Fetching ...

Optimization by linear kinetic equations and mean-field Langevin dynamics

Lorenzo Pareschi

TL;DR

This analysis highlights the strong link between linear Boltzmann equations and stochastic optimization methods governed by Markov processes and how convergence to the global minimum can be related to classical entropy inequalities.

Abstract

Probably one of the most striking examples of the close connections between global optimization processes and statistical physics is the simulated annealing method, inspired by the famous Monte Carlo algorithm devised by Metropolis et al. in the middle of the last century. In this paper we show how the tools of linear kinetic theory allow to describe this gradient-free algorithm from the perspective of statistical physics and how convergence to the global minimum can be related to classical entropy inequalities. This analysis highlight the strong link between linear Boltzmann equations and stochastic optimization methods governed by Markov processes. Thanks to this formalism we can establish the connections between the simulated annealing process and the corresponding mean-field Langevin dynamics characterized by a stochastic gradient descent approach. Generalizations to other selection strategies in simulated annealing that avoid the acceptance-rejection dynamic are also provided.

Optimization by linear kinetic equations and mean-field Langevin dynamics

TL;DR

This analysis highlights the strong link between linear Boltzmann equations and stochastic optimization methods governed by Markov processes and how convergence to the global minimum can be related to classical entropy inequalities.

Abstract

Probably one of the most striking examples of the close connections between global optimization processes and statistical physics is the simulated annealing method, inspired by the famous Monte Carlo algorithm devised by Metropolis et al. in the middle of the last century. In this paper we show how the tools of linear kinetic theory allow to describe this gradient-free algorithm from the perspective of statistical physics and how convergence to the global minimum can be related to classical entropy inequalities. This analysis highlight the strong link between linear Boltzmann equations and stochastic optimization methods governed by Markov processes. Thanks to this formalism we can establish the connections between the simulated annealing process and the corresponding mean-field Langevin dynamics characterized by a stochastic gradient descent approach. Generalizations to other selection strategies in simulated annealing that avoid the acceptance-rejection dynamic are also provided.
Paper Structure (11 sections, 4 theorems, 62 equations, 4 figures, 2 algorithms)

This paper contains 11 sections, 4 theorems, 62 equations, 4 figures, 2 algorithms.

Key Result

Lemma 3.1

For any symmetric probability density $p(\xi)$ and any integrable function $g(x,x')$ we have

Figures (4)

  • Figure 1: The probability to accept a trial point in simulated annealing
  • Figure 2: The prototype Ackley function (left) and the corresponding steady states (right) given by the Gibbs measure \ref{['eq:Gibbs']} for various values of the control temperature.
  • Figure 3: Solution of KSA and MSA for a fixed control temperature $T=2$ for $\varepsilon=0.01$ (left) and $\varepsilon=0.0001$ (right). On the top the probability density at final time $t=2$, on the bottom relative entropies along the simulation. As a reference we also report the mean-field (Reference) and the stochastic Langevin dynamic (MFL) results. All plots have been obtained averaging over $N=5\times 10^4$ runs.
  • Figure 4: Solution of KSA and MSA for a time-dependent control temperature $T(t)=2\log(2)/\log(2+t)$ for $\varepsilon=0.01$ (left) and $\varepsilon=0.0001$ (right). On the top the probability density at final time $t=20$, on the bottom relative entropies along the simulation. As a reference we also report the mean-field (Reference) and the stochastic Langevin dynamic (MFL) results. All plots have been obtained averaging over $N=50.000$ runs.

Theorems & Definitions (7)

  • Lemma 3.1
  • Lemma 3.2
  • Remark 3.1
  • Theorem 3.1
  • Remark 3.2
  • Theorem 3.2
  • Remark 3.3