Table of Contents
Fetching ...

Momentum Accelerates Evolutionary Dynamics

Marc Harper, Joshua Safyan

TL;DR

This work investigates accelerating evolutionary dynamics on the probability simplex by injecting momentum, interpreted as intergenerational memory. Using the Kullback–Leibler divergence $D_{KL}(\hat{x}||x)$ as a Lyapunov function, it proves that momentum preserves evolutionarily stable states for small momentum while speeding up convergence of both the replicator dynamics and Euclidean gradient descent, with a continuous-time scaling factor of $\tfrac{1}{1-\beta}$. It further shows that momentum can qualitatively alter dynamics, potentially breaking cycles in rock–paper–scissors landscapes into convergence or divergence depending on momentum type and learning rate. The results bridge evolutionary game theory and optimization, providing analytic convergence-rate enhancements and clear conditions under which stability is maintained. Supplemental proofs and open-source code support rigorous validation and reproducibility.

Abstract

We combine momentum from machine learning with evolutionary dynamics, where momentum can be viewed as a simple mechanism of intergenerational memory. Using information divergences as Lyapunov functions, we show that momentum accelerates the convergence of evolutionary dynamics including the replicator equation and Euclidean gradient descent on populations. When evolutionarily stable states are present, these methods prove convergence for small learning rates or small momentum, and yield an analytic determination of the relative decrease in time to converge that agrees well with computations. The main results apply even when the evolutionary dynamic is not a gradient flow. We also show that momentum can alter the convergence properties of these dynamics, for example by breaking the cycling associated to the rock-paper-scissors landscape, leading to either convergence to the ordinarily non-absorbing equilibrium, or divergence, depending on the value and mechanism of momentum.

Momentum Accelerates Evolutionary Dynamics

TL;DR

This work investigates accelerating evolutionary dynamics on the probability simplex by injecting momentum, interpreted as intergenerational memory. Using the Kullback–Leibler divergence as a Lyapunov function, it proves that momentum preserves evolutionarily stable states for small momentum while speeding up convergence of both the replicator dynamics and Euclidean gradient descent, with a continuous-time scaling factor of . It further shows that momentum can qualitatively alter dynamics, potentially breaking cycles in rock–paper–scissors landscapes into convergence or divergence depending on momentum type and learning rate. The results bridge evolutionary game theory and optimization, providing analytic convergence-rate enhancements and clear conditions under which stability is maintained. Supplemental proofs and open-source code support rigorous validation and reproducibility.

Abstract

We combine momentum from machine learning with evolutionary dynamics, where momentum can be viewed as a simple mechanism of intergenerational memory. Using information divergences as Lyapunov functions, we show that momentum accelerates the convergence of evolutionary dynamics including the replicator equation and Euclidean gradient descent on populations. When evolutionarily stable states are present, these methods prove convergence for small learning rates or small momentum, and yield an analytic determination of the relative decrease in time to converge that agrees well with computations. The main results apply even when the evolutionary dynamic is not a gradient flow. We also show that momentum can alter the convergence properties of these dynamics, for example by breaking the cycling associated to the rock-paper-scissors landscape, leading to either convergence to the ordinarily non-absorbing equilibrium, or divergence, depending on the value and mechanism of momentum.

Paper Structure

This paper contains 17 sections, 4 theorems, 18 equations, 5 figures.

Key Result

Theorem 1

Let $\hat{x}$ be an ESS for a replicator dynamic. Then is a local Lyapunov function for the discrete and continuous replicator dynamic.

Figures (5)

  • Figure 1: Examples of altered convergence time for Polyak (top 2) and Nesterov (bottom 2) momentum. In all cases we use a landscape with $a=2$ and $b=1$ and $\alpha=1/200$. As $\beta$ increases, the dynamics typically converge faster, and the trajectories are not identical since $\alpha > 0$. However, for Polyak momentum (top), as the value of $\beta$ becomes closer to 1, the Lyapunov quantity eventually fails to be monotonic along the entirety of the trajectory (it is at best local). Contrast with the Nesterov momentum trajectories (bottom) for the same parameters, which in this case are all monotonically decreasing.
  • Figure 2: For large values of momentum the dynamic may fail to converge as in the momentum free case if $\alpha$ is not sufficiently small. For all trajectories here $\alpha=0.01$, $a=2$, and $b=-1$. Lowering $\alpha$ to 0.001 restores convergence of the red $\beta=0.9$ curve.
  • Figure 3: Left: Convergence speed up for Polyak momentum: Convergence time for small learning rates are well approximated by $(1 - \beta)$ times the momentum free convergence time ($\beta=0$) of iterations for small learning rates. Right: The dynamic with Nesterov momentum is also fairly well approximated by a constant factor times the momentum free convergence time, but is clearly not scaled by the same factor. The fitness landscape is defined by $a=1=b$.
  • Figure 4: For the rock-paper-scissors landscape ($a=1$, $b=-1$), momentum $\beta=0.65$, and learning rate $\alpha=1/200$, the replicator equation cycles indefinitely with constant KL-divergence based on the initial point. Adding momentum with a non-zero learning rate can cause the cycling to break into either convergence or divergence. In this case Nesterov momentum causes the dynamic to converge while Polyak momentum causes the dynamic to slowly diverge to the boundary.
  • Figure 5: Graphical depiction of Theorem \ref{['main_theorem']} in terms of the properties of the dynamic coefficient $\frac{1}{1-\beta}$. As $\beta$ varies the convergence and trajectory velocity changes in accordance with the coefficient $1 / (1 - \beta)$. The trajectory velocity is increasing with $\beta$ on $(-\infty, 1)$ and $(1, \infty)$, the orientation is reversed on $(1, \infty)$, and the velocity is faster than the momentum free case ($\beta = 0$) for $(0, 1)$ and $(1, 2)$.

Theorems & Definitions (5)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • proof