Leveraging Memory Effects and Gradient Information in Consensus-Based Optimization: On Global Convergence in Mean-Field Law
Konstantin Riedl
TL;DR
The paper tackles global optimization in high dimensions for nonconvex, nonsmooth objectives by analyzing a memory-augmented consensus-based optimization (CBO) method that also incorporates gradient information. It develops a mean-field framework, deriving a nonlinear SDE and the corresponding Fokker-Planck equation, and proves exponential convergence of the law to the global minimizer $x^*$ of ${\cal E}$ via a Lyapunov functional ${\cal V}(\rho_t)$ with explicit rates. A quantitative Laplace principle bound and a positive lower bound on the probability mass near $x^*$ enable nonasymptotic convergence guarantees, while numerical experiments in machine learning and compressed sensing illustrate practical benefits of memory and gradient terms. Overall, the work provides rigorous global convergence theory for a flexible CBO variant and demonstrates its effectiveness across challenging high-dimensional tasks.
Abstract
In this paper we study consensus-based optimization (CBO), a versatile, flexible and customizable optimization method suitable for performing nonconvex and nonsmooth global optimizations in high dimensions. CBO is a multi-particle metaheuristic, which is effective in various applications and at the same time amenable to theoretical analysis thanks to its minimalistic design. The underlying dynamics, however, is flexible enough to incorporate different mechanisms widely used in evolutionary computation and machine learning, as we show by analyzing a variant of CBO which makes use of memory effects and gradient information. We rigorously prove that this dynamics converges to a global minimizer of the objective function in mean-field law for a vast class of functions under minimal assumptions on the initialization of the method. The proof in particular reveals how to leverage further, in some applications advantageous, forces in the dynamics without loosing provable global convergence. To demonstrate the benefit of the herein investigated memory effects and gradient information in certain applications, we present numerical evidence for the superiority of this CBO variant in applications such as machine learning and compressed sensing, which en passant widen the scope of applications of CBO.
