Table of Contents
Fetching ...

Swarm-based gradient descent meets simulated annealing

Zhiyan Ding, Martin Guerra, Qin Li, Eitan Tadmor

Abstract

We introduce a novel method for non-convex optimization, called Swarm-based Simulated Annealing (SSA), which is at the interface between the swarm-based gradient-descent (SBGD) [J. Lu et. al., ArXiv:2211.17157; E.Tadmor and A. Zenginoglu, Acta Applicandae Math., 190, 2024] and Simulated Annealing (SA) [V. Cerny, J. optimization theory and appl., 45:41-51, 1985; S.Kirkpatrick et. al., Science, 220(4598):671-680, 1983; S. Geman and C.-R. Hwang, SIAM J. Control and Optimization, 24(5):1031-1043, 1986]. Similar to SBGD, we introduce a swarm of agents, each identified with a position, ${\mathbf x}$ and mass $m$, to explore the ambient space. Similar to SA, the agents proceed in the gradient descent direction, and are subject to Brownian motion. The annealing rate, however, is dictated by a decreasing function of their mass. As a consequence, instead of the SA protocol for time-decreasing temperature, we let the swarm decide how to `cool down' agents, depending on their accumulated mass over time. The dynamics of masses is coupled with the dynamics of positions: agents at higher ground transfer (part of) their mass to those at lower ground. Consequently, resulting SSA optimizer is dynamically divided between heavier, cooler agents viewed as `leaders' and lighter, warmer agents viewed as `explorers'. Mean-field convergence analysis and benchmark optimizations demonstrate the effectiveness of the swarm-based method as a multi-dimensional global optimizer.

Swarm-based gradient descent meets simulated annealing

Abstract

We introduce a novel method for non-convex optimization, called Swarm-based Simulated Annealing (SSA), which is at the interface between the swarm-based gradient-descent (SBGD) [J. Lu et. al., ArXiv:2211.17157; E.Tadmor and A. Zenginoglu, Acta Applicandae Math., 190, 2024] and Simulated Annealing (SA) [V. Cerny, J. optimization theory and appl., 45:41-51, 1985; S.Kirkpatrick et. al., Science, 220(4598):671-680, 1983; S. Geman and C.-R. Hwang, SIAM J. Control and Optimization, 24(5):1031-1043, 1986]. Similar to SBGD, we introduce a swarm of agents, each identified with a position, and mass , to explore the ambient space. Similar to SA, the agents proceed in the gradient descent direction, and are subject to Brownian motion. The annealing rate, however, is dictated by a decreasing function of their mass. As a consequence, instead of the SA protocol for time-decreasing temperature, we let the swarm decide how to `cool down' agents, depending on their accumulated mass over time. The dynamics of masses is coupled with the dynamics of positions: agents at higher ground transfer (part of) their mass to those at lower ground. Consequently, resulting SSA optimizer is dynamically divided between heavier, cooler agents viewed as `leaders' and lighter, warmer agents viewed as `explorers'. Mean-field convergence analysis and benchmark optimizations demonstrate the effectiveness of the swarm-based method as a multi-dimensional global optimizer.
Paper Structure (25 sections, 13 theorems, 105 equations, 17 figures, 1 algorithm)

This paper contains 25 sections, 13 theorems, 105 equations, 17 figures, 1 algorithm.

Key Result

Theorem 2.1

Assume that Assumption assumption: assmptn2 holds. Let $\mu_t=\mu_t(\mathbf x,m)$ be the mean-field solution of eqn:mean_field_pde and let $\mu^N_t = \frac{1}{N}\sum_{j=1}^{N}\delta_{\mathbf x^j_t}(\mathbf x)\otimes\delta_{m^j_t}(m)$ be the empirical distribution associated with the ensemble of swar and the corresponding provisional minimum eqn:ave_stochastic, $\overline{F}^N_t$, converges to $\ov

Figures (17)

  • Figure 6.1: Non-convex functions that will be used for the numerical examples when $d=1$.
  • Figure 6.2: $\sigma$ function for different values of the parameter $\beta$ and $\lambda=1$.
  • Figure 6.3: Swarm movement (in red) for Ackley function in $d=1$.
  • Figure 6.4: Expectation of $\overline{F}^N_t$ for different values of $N$ for Ackley function in $d=1$ in log-scale. The shaded area shows the difference between the lower quartile and the higher quartile for $\overline{F}^N_t$.
  • Figure 6.5: Swarm movement (in red) for Ackley function in $d=2$.
  • ...and 12 more figures

Theorems & Definitions (23)

  • Remark 1.1: On the choice of provisional minimum
  • Theorem 2.1: Mean-field limit
  • Theorem 2.2: Large time behavior
  • Theorem 2.3
  • Theorem 4.1: Lack of communication and failure in probability
  • Theorem 4.2
  • Lemma 5.1
  • Lemma 5.2
  • proof : Proof of Lemma \ref{['lem:aux_mean']}
  • proof : Proof of Lemma \ref{['lem:aux_sde']}
  • ...and 13 more