Table of Contents
Fetching ...

Swarm-based optimization with random descent

Eitan Tadmor, Anil Zenginoglu

TL;DR

The paper addresses non-convex optimization by extending swarm-based gradient descent to allow random descent directions, preserving a descent guarantee via a mass-driven update scheme.The Swarm-Based Random Descent (SBRD) algorithm combines mass transfer toward the current minimizer with a descent step along directions drawn from a spherical cap around the gradient, using a backtracking line search to ensure progress.Theoretical results establish convergence to a band of local minima and, for analytic functions, provide rate estimates via a Lojasiewicz framework, while numerical experiments show superior performance of SBRD over the gradient-based variant in high-dimensional settings.These findings demonstrate that random directional exploration, coupled with adaptive mass dynamics and backtracking, yields a robust multi-dimensional global optimizer with practical implications for non-convex problems.

Abstract

We extend our study of the swarm-based gradient descent method for non-convex optimization, [Lu, Tadmor & Zenginoglu, arXiv:2211.17157], to allow random descent directions. We recall that the swarm-based approach consists of a swarm of agents, each identified with a position, ${\mathbf x}$, and mass, $m$. The key is the transfer of mass from high ground to low(-est) ground. The mass of an agent dictates its step size: lighter agents take larger steps. In this paper, the essential new feature is the choice of direction: rather than restricting the swarm to march in the steepest gradient descent, we let agents proceed in randomly chosen directions centered around -- but otherwise different from -- the gradient direction. The random search secures the descent property while at the same time, enabling greater exploration of ambient space. Convergence analysis and benchmark optimizations demonstrate the effectiveness of the swarm-based random descent method as a multi-dimensional global optimizer.

Swarm-based optimization with random descent

TL;DR

The paper addresses non-convex optimization by extending swarm-based gradient descent to allow random descent directions, preserving a descent guarantee via a mass-driven update scheme.The Swarm-Based Random Descent (SBRD) algorithm combines mass transfer toward the current minimizer with a descent step along directions drawn from a spherical cap around the gradient, using a backtracking line search to ensure progress.Theoretical results establish convergence to a band of local minima and, for analytic functions, provide rate estimates via a Lojasiewicz framework, while numerical experiments show superior performance of SBRD over the gradient-based variant in high-dimensional settings.These findings demonstrate that random directional exploration, coupled with adaptive mass dynamics and backtracking, yields a robust multi-dimensional global optimizer with practical implications for non-convex problems.

Abstract

We extend our study of the swarm-based gradient descent method for non-convex optimization, [Lu, Tadmor & Zenginoglu, arXiv:2211.17157], to allow random descent directions. We recall that the swarm-based approach consists of a swarm of agents, each identified with a position, , and mass, . The key is the transfer of mass from high ground to low(-est) ground. The mass of an agent dictates its step size: lighter agents take larger steps. In this paper, the essential new feature is the choice of direction: rather than restricting the swarm to march in the steepest gradient descent, we let agents proceed in randomly chosen directions centered around -- but otherwise different from -- the gradient direction. The random search secures the descent property while at the same time, enabling greater exploration of ambient space. Convergence analysis and benchmark optimizations demonstrate the effectiveness of the swarm-based random descent method as a multi-dimensional global optimizer.
Paper Structure (11 sections, 3 theorems, 54 equations, 4 figures, 8 tables, 2 algorithms)

This paper contains 11 sections, 3 theorems, 54 equations, 4 figures, 8 tables, 2 algorithms.

Key Result

Proposition 3.1

Consider the SBRD iterations eqs:SBD with random-based search direction, ${\mathbf p}^n_i$, determined by Algorithm alg:random, and with a step-size eq:step length, $h^n_i=h(\mathbf{x}_i^n,\lambda\widetilde{m}^{n+1}_i)$, determined by backtracking line search of Algorithm alg:backtracking. Let $\{\m Here, $q\geqslant 1$ is the mass transfer parameter in eq:etai.

Figures (4)

  • Figure 1.1: ${\mathbf q}^n_i$ is the gradient orientation --- the unit vector along the gradient direction, $\nabla F(\mathbf{x}_i^n)$, and the unit vector, ${\boldsymbol \omega}^n_i$, is determined by a randomly chosen point on a spherical cap centered around ${\mathbf q}^n_i$ (shown as the shaded part of the sphere).
  • Figure 4.1: Two-dimensional landscapes for the test functions Ackley \ref{['eq:Ackley']}, Rastrigin \ref{['eq:Rastrigin']}, Rosenbrock \ref{['eq:Rosenbrock']}, and Styblinski-Tang \ref{['eq:ST']} with a contour plot on the bottom and a red star indicating the global minimum.
  • Figure 4.2: Loss functions for minimizers and heaviest agents as defined in \ref{['eqs:XminXplus']} and \ref{['eq:Xplus']} during optimization of the two-dimensional Ackley function for $20$ simulations with $N=50$ agents. Note the different scales on the $y$-axis between the minimizers and heaviest agents.
  • Figure 4.3: A comparison of random descent, $-{\mathbf p}^n_i$ (orange arrows), and gradient descent, $-\nabla F(\mathbf{x}_i^n)$ (brown lines), during a simulation for the optimization of Ackley function (left) and the Rastrigin function (right). The green triangle is the current minimizer; the upside-down red triangle is the worst agent. The angles between the two directions and the step sizes are larger for lighter agents. Random descent is a better alternative to gradient descent for some agents and worse for others.

Theorems & Definitions (4)

  • Remark 1.1
  • Proposition 3.1
  • Theorem 3.2
  • Theorem 3.3