Swarm-based gradient descent meets simulated annealing

Zhiyan Ding; Martin Guerra; Qin Li; Eitan Tadmor

Swarm-based gradient descent meets simulated annealing

Zhiyan Ding, Martin Guerra, Qin Li, Eitan Tadmor

Abstract

We introduce a novel method for non-convex optimization, called Swarm-based Simulated Annealing (SSA), which is at the interface between the swarm-based gradient-descent (SBGD) [J. Lu et. al., ArXiv:2211.17157; E.Tadmor and A. Zenginoglu, Acta Applicandae Math., 190, 2024] and Simulated Annealing (SA) [V. Cerny, J. optimization theory and appl., 45:41-51, 1985; S.Kirkpatrick et. al., Science, 220(4598):671-680, 1983; S. Geman and C.-R. Hwang, SIAM J. Control and Optimization, 24(5):1031-1043, 1986]. Similar to SBGD, we introduce a swarm of agents, each identified with a position, ${\mathbf x}$ and mass $m$, to explore the ambient space. Similar to SA, the agents proceed in the gradient descent direction, and are subject to Brownian motion. The annealing rate, however, is dictated by a decreasing function of their mass. As a consequence, instead of the SA protocol for time-decreasing temperature, we let the swarm decide how to `cool down' agents, depending on their accumulated mass over time. The dynamics of masses is coupled with the dynamics of positions: agents at higher ground transfer (part of) their mass to those at lower ground. Consequently, resulting SSA optimizer is dynamically divided between heavier, cooler agents viewed as `leaders' and lighter, warmer agents viewed as `explorers'. Mean-field convergence analysis and benchmark optimizations demonstrate the effectiveness of the swarm-based method as a multi-dimensional global optimizer.

Swarm-based gradient descent meets simulated annealing

Abstract

and mass

, to explore the ambient space. Similar to SA, the agents proceed in the gradient descent direction, and are subject to Brownian motion. The annealing rate, however, is dictated by a decreasing function of their mass. As a consequence, instead of the SA protocol for time-decreasing temperature, we let the swarm decide how to `cool down' agents, depending on their accumulated mass over time. The dynamics of masses is coupled with the dynamics of positions: agents at higher ground transfer (part of) their mass to those at lower ground. Consequently, resulting SSA optimizer is dynamically divided between heavier, cooler agents viewed as `leaders' and lighter, warmer agents viewed as `explorers'. Mean-field convergence analysis and benchmark optimizations demonstrate the effectiveness of the swarm-based method as a multi-dimensional global optimizer.

Paper Structure (25 sections, 13 theorems, 105 equations, 17 figures, 1 algorithm)

This paper contains 25 sections, 13 theorems, 105 equations, 17 figures, 1 algorithm.

Introduction
In-swarm communication combined with stochastic search
The swarm-based optimization with adjusted annealing rate
Related work
Statement of main results
From empirical distribution to mean-field
Large-time convergence--- from mean-field to global minimum
From mean-field to macroscopic description
Pitfalls
Swarming with no communication
Stochastic system
Proofs of the main results
Convergence to the mean-field limit
Large time behavior
Numerical experiments
...and 10 more sections

Key Result

Theorem 2.1

Assume that Assumption assumption: assmptn2 holds. Let $\mu_t=\mu_t(\mathbf x,m)$ be the mean-field solution of eqn:mean_field_pde and let $\mu^N_t = \frac{1}{N}\sum_{j=1}^{N}\delta_{\mathbf x^j_t}(\mathbf x)\otimes\delta_{m^j_t}(m)$ be the empirical distribution associated with the ensemble of swar and the corresponding provisional minimum eqn:ave_stochastic, $\overline{F}^N_t$, converges to $\ov

Figures (17)

Figure 6.1: Non-convex functions that will be used for the numerical examples when $d=1$.
Figure 6.2: $\sigma$ function for different values of the parameter $\beta$ and $\lambda=1$.
Figure 6.3: Swarm movement (in red) for Ackley function in $d=1$.
Figure 6.4: Expectation of $\overline{F}^N_t$ for different values of $N$ for Ackley function in $d=1$ in log-scale. The shaded area shows the difference between the lower quartile and the higher quartile for $\overline{F}^N_t$.
Figure 6.5: Swarm movement (in red) for Ackley function in $d=2$.
...and 12 more figures

Theorems & Definitions (23)

Remark 1.1: On the choice of provisional minimum
Theorem 2.1: Mean-field limit
Theorem 2.2: Large time behavior
Theorem 2.3
Theorem 4.1: Lack of communication and failure in probability
Theorem 4.2
Lemma 5.1
Lemma 5.2
proof : Proof of Lemma \ref{['lem:aux_mean']}
proof : Proof of Lemma \ref{['lem:aux_sde']}
...and 13 more

Swarm-based gradient descent meets simulated annealing

Abstract

Swarm-based gradient descent meets simulated annealing

Authors

Abstract

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (23)