Table of Contents
Fetching ...

Abnormal Mutations: Evolution Strategies Don't Require Gaussianity

Jacob de Nobel, Diederick Vermetten, Hao Wang, Anna V. Kononova, Günter Rudolph, Thomas Bäck

TL;DR

Problem: Gaussian mutation has long been presumed essential for Evolution Strategies (ES). Approach: the paper empirically benchmarks multiple ES variants ((1+1)-ES, $\mu/\mu$, $\lambda$-SA-ES, CMA-ES) with alternative mutation distributions on the sphere model and 24 BBOB problems, and analyzes effective step length $||\boldsymbol{z}||_2$ and angle isotropy $\theta$; path-length normalization is adapted for non-Gaussian cases. Findings: performance is largely distribution-agnostic; local convergence slows with the Cauchy distribution in non-elitist setups but can aid in multimodal/neutral landscapes; CMA-ES and elitist variants remain robust under non-Gaussian mutations; isotropy differences largely dissipate when step sizes are matched. Significance: Gaussianity is not a strict prerequisite for ES, enabling flexible mutation design, potential efficiency gains, and avenues for dynamic switching or constraint-aware mutation strategies in practical optimization.

Abstract

The mutation process in evolution strategies has been interlinked with the normal distribution since its inception. Many lines of reasoning have been given for this strong dependency, ranging from maximum entropy arguments to the need for isotropy. However, some theoretical results suggest that other distributions might lead to similar local convergence properties. This paper empirically shows that a wide range of evolutionary strategies, from the (1+1)-ES to CMA-ES, show comparable optimization performance when using a mutation distribution other than the standard Gaussian. Replacing it with, e.g., uniformly distributed mutations, does not deteriorate the performance of ES, when using the default adaptation mechanism for the strategy parameters. We observe that these results hold not only for the sphere model but also for a wider range of benchmark problems.

Abnormal Mutations: Evolution Strategies Don't Require Gaussianity

TL;DR

Problem: Gaussian mutation has long been presumed essential for Evolution Strategies (ES). Approach: the paper empirically benchmarks multiple ES variants ((1+1)-ES, , -SA-ES, CMA-ES) with alternative mutation distributions on the sphere model and 24 BBOB problems, and analyzes effective step length and angle isotropy ; path-length normalization is adapted for non-Gaussian cases. Findings: performance is largely distribution-agnostic; local convergence slows with the Cauchy distribution in non-elitist setups but can aid in multimodal/neutral landscapes; CMA-ES and elitist variants remain robust under non-Gaussian mutations; isotropy differences largely dissipate when step sizes are matched. Significance: Gaussianity is not a strict prerequisite for ES, enabling flexible mutation design, potential efficiency gains, and avenues for dynamic switching or constraint-aware mutation strategies in practical optimization.

Abstract

The mutation process in evolution strategies has been interlinked with the normal distribution since its inception. Many lines of reasoning have been given for this strong dependency, ranging from maximum entropy arguments to the need for isotropy. However, some theoretical results suggest that other distributions might lead to similar local convergence properties. This paper empirically shows that a wide range of evolutionary strategies, from the (1+1)-ES to CMA-ES, show comparable optimization performance when using a mutation distribution other than the standard Gaussian. Replacing it with, e.g., uniformly distributed mutations, does not deteriorate the performance of ES, when using the default adaptation mechanism for the strategy parameters. We observe that these results hold not only for the sphere model but also for a wider range of benchmark problems.

Paper Structure

This paper contains 24 sections, 2 equations, 11 figures, 1 table, 3 algorithms.

Figures (11)

  • Figure 1: Probability density function for the Cauchy, double Weibull, Laplace, logistics, and uniform distributions.
  • Figure 2: Effective step length $L_2$-norm for each sampler type, parameterized according to Table \ref{['tab:distributions']}, for increasing dimensionalities $n$. The distributions for which $||\mathbf{z}\xspace||_2$ scales proportional to $\sqrt{n}$ are shown in the top figure; Cauchy is shown separately. Note the log-scaling of the y-axis for the bottom figure.
  • Figure 3: Normalized angle distribution of $10^5$ sampled points in dimensionality $n=2$ versus a vector of all ones, i.e., $\mathbf{1}^n$, for each probability distribution, parameterized according to Table \ref{['tab:distributions']}.
  • Figure 4: Evolution of the mutation rate $\sigma$ of the (1+1)-ES with 1/5th success rule on the sphere model $f(\mathbf{x}\xspace) = \mathbf{x}\xspace'\mathbf{x}\xspace$, averaged over 1000 runs, for dimensionalities $n \in \{2, 10, 50\}$ for different mutation distributions.
  • Figure 5: Hitting times of target precision $10^{-8}$ for (1+1)-ES with a 1/5-success rule on sphere model, for different sampling distributions to determine step-size. Left: using the standard distribution. Center: Normalized mutation vectors to isolate the effect of isotropy. Right: using sphered versions of the distributions to isolate the effects of effective step size. Distributions are all over $1000$ instances of the sphere model.
  • ...and 6 more figures