Table of Contents
Fetching ...

Controlled stochastic processes for simulated annealing

Vincent Molin, Axel Ringh, Moritz Schauer, Akash Sharma

TL;DR

This work recasts simulated annealing as a time-evolving curve of Gibbs measures $t\mapsto \mu_t$ and proves there exists a minimal-norm velocity field $v_t$ solving the continuity equation $\partial_t\mu_t+\nabla\cdot(v_t\mu_t)=0$ that can guide particles along arbitrarily fast cooling. The velocity field is characterized via optimal transport, and in the Gaussian case it reduces to a simple linear form $v_t(x)=-\frac{\beta'(t)}{2\beta(t)}x$, while in general it is the gradient of a potential solving an elliptic PDE tied to the operator $\mathcal{L}_t=\frac{1}{\beta(t)}\Delta-\langle\nabla U,\nabla\cdot\rangle$. Leveraging $v_t$, the authors construct diffusion and piecewise deterministic Markov processes whose time marginals align with $\mu_t$, and develop tractable OT-based approximations via self-normalized importance sampling to implement an interacting-particle acceleration scheme. Numerical experiments on a double-well potential and standard benchmark functions demonstrate that velocity-guided transport enhances escape from local minima and accelerates convergence relative to classical SA dynamics. The framework thus provides a principled route to bypass logarithmic cooling constraints through transport-based control, with practical particle algorithms that scale to multidimensional optimization tasks.

Abstract

Simulated annealing solves global optimization problems by means of a random walk in a cooling energy landscape based on the objective function and a temperature parameter. However, if the temperature is decreased too quickly, this procedure often gets stuck in suboptimal local minima. In this work, we consider the cooling landscape as a curve of probability measures. We prove the existence of a minimal norm velocity field which solves the continuity equation, a differential equation that governs the evolution of the aforementioned curve. The solution is the weak gradient of an integrable function, which is in line with the interpretation of the velocity field as a derivative of optimal transport maps. We show that controlling stochastic annealing processes by superimposing this velocity field would allow them to follow arbitrarily fast cooling schedules. Here we consider annealing processes based on diffusions and piecewise deterministic Markov processes. Based on convergent optimal transport-based approximations to this control, we design a novel interacting particle--based optimization method that accelerates annealing. We validate this accelerating behaviour in numerical experiments.

Controlled stochastic processes for simulated annealing

TL;DR

This work recasts simulated annealing as a time-evolving curve of Gibbs measures and proves there exists a minimal-norm velocity field solving the continuity equation that can guide particles along arbitrarily fast cooling. The velocity field is characterized via optimal transport, and in the Gaussian case it reduces to a simple linear form , while in general it is the gradient of a potential solving an elliptic PDE tied to the operator . Leveraging , the authors construct diffusion and piecewise deterministic Markov processes whose time marginals align with , and develop tractable OT-based approximations via self-normalized importance sampling to implement an interacting-particle acceleration scheme. Numerical experiments on a double-well potential and standard benchmark functions demonstrate that velocity-guided transport enhances escape from local minima and accelerates convergence relative to classical SA dynamics. The framework thus provides a principled route to bypass logarithmic cooling constraints through transport-based control, with practical particle algorithms that scale to multidimensional optimization tasks.

Abstract

Simulated annealing solves global optimization problems by means of a random walk in a cooling energy landscape based on the objective function and a temperature parameter. However, if the temperature is decreased too quickly, this procedure often gets stuck in suboptimal local minima. In this work, we consider the cooling landscape as a curve of probability measures. We prove the existence of a minimal norm velocity field which solves the continuity equation, a differential equation that governs the evolution of the aforementioned curve. The solution is the weak gradient of an integrable function, which is in line with the interpretation of the velocity field as a derivative of optimal transport maps. We show that controlling stochastic annealing processes by superimposing this velocity field would allow them to follow arbitrarily fast cooling schedules. Here we consider annealing processes based on diffusions and piecewise deterministic Markov processes. Based on convergent optimal transport-based approximations to this control, we design a novel interacting particle--based optimization method that accelerates annealing. We validate this accelerating behaviour in numerical experiments.

Paper Structure

This paper contains 31 sections, 17 theorems, 165 equations, 10 figures, 2 algorithms.

Key Result

Proposition 2.1

Let $\mu,\nu\in\mathscr{P}_p({\mathbb{R}^d})$. If $\mu$ has a Lebesgue density then there exists a Monge map solving eq:wass_p, that is, a measurable map $T\colon{\mathbb{R}^d}\to{\mathbb{R}^d}$ such that $\nu=T_\#\mu$ and This corresponds to the deterministic optimal coupling $\gamma^*$, given by Further, when $p>1$, $T$ is $\mu$-almost everywhere unique and we denote by ${T_{\mu\to\nu}}$ the o

Figures (10)

  • Figure 1: The double-well potential $U$, (left) and the cooling schedule $\beta$ of \ref{['sec:exp_doublewell']}.
  • Figure 2: Heat maps of empirical probability densities for the Gibbs measures corresponding to the double-well potential $U$ in \ref{['eq:doublewell']} and the quadratic cooling schedule $\beta(t) = \frac{1}{4} + 25t^2$, for $0 \leq t \leq 1$. The Langevin versions are simulated with 1000 steps of length $\lambda\Delta t = 25\cdot 10^{-3}$. For the controlled versions we set $h=2\cdot 10^{-2}$, that is, we compute a new velocity estimate 50 times. From the upper right panel, we see that the fact that $\beta$ increases quickly causes particles obeying the independent Langevin dynamics to stick in the suboptimal well. In contrast, from the two panels at the bottom row we see that the interactions in the proposed method allows particles to escape the local minima. Histograms are binned averages over 10,000 particle trajectories, coloured with an inverse hyperbolic sine colour scale.
  • Figure 3: Heat maps of empirical probability densities for the Gibbs measures corresponding to the double-well potential $U$ in \ref{['eq:doublewell']} and the quadratic cooling schedule $\beta(t) = \frac{1}{4} + 25t^2$, $0\leq t \leq 1$. The PDSA versions are simulated with a speed scale $\lambda = 25$, and for the controlled versions we set $h=2\cdot 10^{-2}$, that is, we compute a velocity estimates 50 times. As in the Langevin case in \ref{['fig:doublewell_langevin_hists']}, the independent PDSA dynamics of particles stick in the suboptimal well. In contrast, the interactions in the proposed method allows particles to escape the local minima. Histograms are binned averages over 10,000 particle trajectories, coloured with an inverse hyperbolic sine scale.
  • Figure 4: In the left panel, the distance of the time marginals from the Langevin simulations in \ref{['fig:doublewell_langevin_hists']} is shown. Here $n$ is the population size, and $n=1$ corresponds to independent dynamics, while $n=2,5,10$ denotes controlled versions. The largest deviation for the controlled methods happens around $t=0.4$, at which point the Gibbs curve has the largest rate of change, as quantified by the metric derivative $|\mu'|(t) = \|v_t\|_{L^2(\mu_t)}$, see the right panel.
  • Figure 5: Convergence to the double-well Gibbs curve as the number of particles $n$ increases and the velocity update interval $h$ decreases. Varying the number of particles $n$ (horizontal axis) and the velocity update interval $h$ (colour), we average 1,000 simulations to estimate the time marginals $\rho^{(n,h)}_t$ of the simulations, and compute the time averaged $\mathcal{W}_2$-distance, $\overline{\mathcal{W}}_2 = \int_0^1 \mathcal{W}_2(\rho^{(n,h)}_t,\mu_t)\mathop{}\!dt$. Both axes are logarithmically scaled. For smaller $h$, the average distance of the controlled PDSA marginals seems to approach a linear law in this scaling. This suggests that the average distance of the controlled PDSA marginals approaches a power law with respect to the number of particles, while we do not observe this in the Langevin setting.
  • ...and 5 more figures

Theorems & Definitions (38)

  • Proposition 2.1: Existence of Monge transport map, villani2003topics
  • Definition 2.2: Absolutely continuous curves and metric derivatives, ambrosio2008gradientflows
  • Theorem 2.3: Absolutely continuous curves and velocity fields, ambrosio2008gradientflows
  • Remark 2.4
  • Proposition 2.5: Transport map characterization of $v_t$, ambrosio2008gradientflows
  • Theorem 3.1
  • Remark 3.2
  • Remark 3.3: Unit speed parametrization
  • Lemma 3.4
  • proof
  • ...and 28 more