Swarm-Based Gradient Descent Method for Non-Convex Optimization

Jingcheng Lu; Eitan Tadmor; Anil Zenginoglu

Swarm-Based Gradient Descent Method for Non-Convex Optimization

Jingcheng Lu, Eitan Tadmor, Anil Zenginoglu

TL;DR

The paper presents Swarm-Based Gradient Descent (SBGD), a novel global optimizer for non-convex functions that augments the search with a mass variable carried by a swarm of agents. Heavier agents move conservatively while lighter agents explore with larger time steps, and mass transfer toward lower objective values creates a dynamic leader–explorer hierarchy; backtracking line search adapted to the mass weights enforces descent. The authors provide convergence and error analyses showing descent bounds and possible convergence to a band of equi-height local minima, and validate the method through extensive 1D, 2D, and 20D experiments against standard GD, GD(BT), and Adam, highlighting improved robustness to poor initializations and enhanced global exploration. The results indicate SBGD as a competitive global optimizer and a potential pre-conditioner for high-dimensional non-convex problems, with practical implications for reliably escaping local minima in complex landscapes.

Abstract

We introduce a new Swarm-Based Gradient Descent (SBGD) method for non-convex optimization. The swarm consists of agents, each is identified with a position, ${\mathbf x}$, and mass, $m$. The key to their dynamics is communication: masses are being transferred from agents at high ground to low(-est) ground. At the same time, agents change positions with step size, $h=h({\mathbf x},m)$, adjusted to their relative mass: heavier agents proceed with small time-steps in the direction of local gradient, while lighter agents take larger time-steps based on a backtracking protocol. Accordingly, the crowd of agents is dynamically divided between `heavier' leaders, expected to approach local minima, and `lighter' explorers. With their large-step protocol, explorers are expected to encounter improved position for the swarm; if they do, then they assume the role of `heavy' swarm leaders and so on. Convergence analysis and numerical simulations in one-, two-, and 20-dimensional benchmarks demonstrate the effectiveness of SBGD as a global optimizer.

Swarm-Based Gradient Descent Method for Non-Convex Optimization

TL;DR

Abstract

We introduce a new Swarm-Based Gradient Descent (SBGD) method for non-convex optimization. The swarm consists of agents, each is identified with a position,

, and mass,

. The key to their dynamics is communication: masses are being transferred from agents at high ground to low(-est) ground. At the same time, agents change positions with step size,

, adjusted to their relative mass: heavier agents proceed with small time-steps in the direction of local gradient, while lighter agents take larger time-steps based on a backtracking protocol. Accordingly, the crowd of agents is dynamically divided between `heavier' leaders, expected to approach local minima, and `lighter' explorers. With their large-step protocol, explorers are expected to encounter improved position for the swarm; if they do, then they assume the role of `heavy' swarm leaders and so on. Convergence analysis and numerical simulations in one-, two-, and 20-dimensional benchmarks demonstrate the effectiveness of SBGD as a global optimizer.

Paper Structure (25 sections, 4 theorems, 53 equations, 10 figures, 28 tables, 2 algorithms)

This paper contains 25 sections, 4 theorems, 53 equations, 10 figures, 28 tables, 2 algorithms.

Introduction
The Swarm-Based Gradient Descent (SBGD) algorithm
Why communication is important
Alignment towards minimal heading
Implementation of the SBGD${}_{pq}$ algorithm
Communications and mass transition
Backtracking -- a protocol for time stepping
SBGD${}_{pq}$ pseudocode
A general outlook
$\cdot$ General gradient descent directions
$\cdot$ Swarm-based optimization --- a general paradigm
$\cdot$ Survival of the fittest
Convergence and error analysis
Convergence to a band of local minima
Flatness and convergence rate
...and 10 more sections

Key Result

Lemma 5.1

Consider the SBGD${}_{pq}$ iterations eq:SBGD, with step size $h^n_i=h(\mathbf{x}_i^n,\lambda\psi_q(\widetilde{m}^{n+1}_i))$ determined by the backtracking line search in algorithm, alg:backtracking, with shrinkage factor $\gamma\in (0,1)$ and initial step size, $h_0$, large enough so that Then we have the descent bound

Figures (10)

Figure 2.1: Plot of the objective function \ref{['eq:flat basins']}.
Figure 2.2: Basins of attraction for GD(0.8) and Adam(0.1) method.
Figure 2.3: Histograms of problem \ref{['eq:flat basins']} by $m=200$ experiments. Initial data is generated uniformly in $[-3, -1]$. Global minimum at $x^* = 1.5355$.
Figure 6.1: Benchmark functions
Figure 6.2: 1D Ackley with $B = C = 0$. Four iterations of the SBGD visualized on the Ackley landscape show the dynamics of merged agents and convergence patterns.
...and 5 more figures

Theorems & Definitions (7)

Lemma 5.1
Proposition 5.2
Remark 5.3
Theorem 5.4
Remark 5.5
Theorem 5.6
Remark 6.1

Swarm-Based Gradient Descent Method for Non-Convex Optimization

TL;DR

Abstract

Swarm-Based Gradient Descent Method for Non-Convex Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (7)