Mutation Strength Adaptation of the $(μ/μ_I, λ)$-ES for Large Population Sizes on the Sphere Function

Amir Omeradzic; Hans-Georg Beyer

Mutation Strength Adaptation of the $(μ/μ_I, λ)$-ES for Large Population Sizes on the Sphere Function

Amir Omeradzic, Hans-Georg Beyer

TL;DR

This work investigates how mutation-strength adaptation in a $(\mu/\mu_I,\lambda)$-ES with isotropic mutations behaves under large population sizes on the sphere. It combines theoretical steady-state analyses with experiments to characterize the scale-invariant mutation strength $σ^*$ and a steady-state adaptation factor $γ$, across CSA parametrizations and σSA sampling schemes. Key findings show that only the $\sqrt{N}$ CSA variant maintains a roughly constant $γ$ while delivering strong progress, whereas other CSA variants slow adaptation as $N$ or $μ/N$ grow; σSA's performance highly depends on the learning parameter $τ$ and on whether log-normal or normal mutation sampling is used, with notable biases and stability differences. These results inform adaptive population-control strategies for ES in noisy or multimodal optimization settings.

Abstract

The mutation strength adaptation properties of a multi-recombinative $(μ/μ_I, λ)$-ES are studied for isotropic mutations. To this end, standard implementations of cumulative step-size adaptation (CSA) and mutative self-adaptation ($σ$SA) are investigated experimentally and theoretically by assuming large population sizes ($μ$) in relation to the search space dimensionality ($N$). The adaptation is characterized in terms of the scale-invariant mutation strength on the sphere in relation to its maximum achievable value for positive progress. %The results show how the different $σ$-adaptation variants behave as $μ$ and $N$ are varied. Standard CSA-variants show notably different adaptation properties and progress rates on the sphere, becoming slower or faster as $μ$ or $N$ are varied. This is shown by investigating common choices for the cumulation and damping parameters. Standard $σ$SA-variants (with default learning parameter settings) can achieve faster adaptation and larger progress rates compared to the CSA. However, it is shown how self-adaptation affects the progress rate levels negatively. Furthermore, differences regarding the adaptation and stability of $σ$SA with log-normal and normal mutation sampling are elaborated.

Mutation Strength Adaptation of the $(μ/μ_I, λ)$-ES for Large Population Sizes on the Sphere Function

TL;DR

This work investigates how mutation-strength adaptation in a

-ES with isotropic mutations behaves under large population sizes on the sphere. It combines theoretical steady-state analyses with experiments to characterize the scale-invariant mutation strength

and a steady-state adaptation factor

, across CSA parametrizations and σSA sampling schemes. Key findings show that only the

CSA variant maintains a roughly constant

while delivering strong progress, whereas other CSA variants slow adaptation as

grow; σSA's performance highly depends on the learning parameter

and on whether log-normal or normal mutation sampling is used, with notable biases and stability differences. These results inform adaptive population-control strategies for ES in noisy or multimodal optimization settings.

Abstract

The mutation strength adaptation properties of a multi-recombinative

-ES are studied for isotropic mutations. To this end, standard implementations of cumulative step-size adaptation (CSA) and mutative self-adaptation (

SA) are investigated experimentally and theoretically by assuming large population sizes (

) in relation to the search space dimensionality (

). The adaptation is characterized in terms of the scale-invariant mutation strength on the sphere in relation to its maximum achievable value for positive progress. %The results show how the different

-adaptation variants behave as

and

are varied. Standard CSA-variants show notably different adaptation properties and progress rates on the sphere, becoming slower or faster as

are varied. This is shown by investigating common choices for the cumulation and damping parameters. Standard

SA-variants (with default learning parameter settings) can achieve faster adaptation and larger progress rates compared to the CSA. However, it is shown how self-adaptation affects the progress rate levels negatively. Furthermore, differences regarding the adaptation and stability of

SA with log-normal and normal mutation sampling are elaborated.

Paper Structure (9 sections, 67 equations, 7 figures, 2 algorithms)

This paper contains 9 sections, 67 equations, 7 figures, 2 algorithms.

Introduction
Sphere Progress Rate for Large Populations
Cumulative Step-Size Adaptation
Introduction
Steady-State Analysis
Iteration of CSA Steady-State Equations
Scaling Properties of the CSA
Mutative Self-Adaptation
Conclusion

Figures (7)

Figure 1: Median dynamics on the sphere (10 trials) for $\mu=100$, $\lambda=200$, and $N=100$. Given the $R$-dynamics, the progress rate is measured by evaluating $\varphi^{*,(g)} = (R^{(g)}-R^{(g+1)})\frac{N}{R^{(g)}}$, and averaged as $\varphi^*_\mathrm{meas}= \mathrm{mean}(\varphi^{*,(g_0:g_\mathrm{end})})$ ($g_0 \leq g \leq g_\mathrm{end}$, $g_0=20$ reducing initialization effects). On the left, one has CSA \ref{['eq:sqrtN']} (red, $\varphi^*_\mathrm{meas}\approx2.3$), CSA \ref{['eq:linN']} (magenta, $\varphi^*_\mathrm{meas}\approx0.7$), $\sigma\text{SA}_L$ with $\tau=1/\sqrt{2N}$ (blue, $\varphi^*_\mathrm{meas}\approx3.5$), and $\sigma\text{SA}_L$ with $\tau=1/\sqrt{8N}$ (green, $\varphi^*_\mathrm{meas}\approx1.1$). On the right, one has $\varphi^*$ from \ref{['eq:sph_Ndep_pc']} (solid red). For the self-adaptive ES, $\varphi^*(\sigma^*,\tau)=(R^{(0)}-R^{(1)})\frac{N}{R^{(0)}}$ was determined using one-generation experiments with $10^4$ trials for $\tau=1/\sqrt{8N}$ (green) and $\tau=1/\sqrt{2N}$ (blue). The vertical lines mark measured steady-state $\sigma^*_{\mathrm{ss}}$ (same color code as on the right). The second zero of \ref{['eq:sph_Ndep_pc']} is marked in dashed black.
Figure 2: Sphere progress rate $\varphi^*(\sigma^*)$ for varying population sizes. On the left, the solid lines show \ref{['eq:sph_Ndep_pc']} and the corresponding data points \ref{['sec:dyn_phidef']} averaged over $10^4$ trials and normalized using $\varphi^*=\varphi N/R$. The dashed line shows \ref{['sec:dyn_phi_large']}. On the right, \ref{['eq:sph_Ndep_pc']} is used to numerically calculate the second zero $\sigma^*_0$ (red solid) and $\hat{\sigma}^* = \operatorname{arg\,max}\varphi^*(\sigma^*)$ (dash-dotted blue). The black dashed line shows approximation \ref{['eq:signzero_approx']}.
Figure 3: Iteration \ref{['eq:iter_csa_1a']} of CSA-dynamics on the sphere $N=100$ for $\mu=1000$ ($\vartheta=1/2$). One measures $\sigma^*_{\mathrm{ss}}\approx135.51$ with $\sigma^*_0\approx154.5$, giving $\gamma\approx0.88$.
Figure 4: Steady-state $\gamma$ for $(\mu/\mu_I,\lambda)$-CSA-ES ($\vartheta=1/2$) with measured data (solid black with data points) and prediction \ref{['eq:csa_gamma_b_v2']} (dashed black). The dotted colored curves correspond to the Iterations 1A \ref{['eq:iter_csa_1a']} (green), 1B \ref{['eq:iter_csa_1b']} (magenta), 2A \ref{['eq:iter_csa_2a']} (orange), and 2B \ref{['eq:iter_csa_2b']} (cyan). Iteration 2B agrees with $\gamma$ from \ref{['eq:csa_gamma_b_v2']}, showing overlapping curves.
Figure 5: Steady-state $\gamma$ for $(\mu/\mu_I,\lambda)$-CSA-ES ($\vartheta=1/2$). Measured ratio $\sigma^*_{\mathrm{ss}}/\sigma^*_0$ (solid, with dots) compared to $\gamma$ from \ref{['eq:csa_gamma_b_v2']} (dashed) for the CSA variants \ref{['eq:sqrtN']} (blue), \ref{['eq:linN']} (red), and \ref{['eq:han']} (green). $\sigma^*_{\mathrm{ss}}$ is determined as follows. First, using $i=1,\dots,M$ trials, the median dynamics $\sigma_M^{*,(g)}$ = $\mathrm{median}(\sigma_i^{*,(g)})$ is determined over all $M$ trials. Then, $\sigma^*_{\mathrm{ss}}$ = $\mathrm{median}(\sigma_M^{*,(g_\mathrm{end}/2:g_\mathrm{end})})$ is evaluated over the last generations $g=g_\mathrm{end}/2,\dots,g_\mathrm{end}]$ to reduce initialization effects. $M=5$ at least (e.g. for $\mu=2000$ and $N=1000$) and $M=100$ the most (e.g. $\mu=N=10$).
...and 2 more figures

Mutation Strength Adaptation of the $(μ/μ_I, λ)$-ES for Large Population Sizes on the Sphere Function

TL;DR

Abstract

Mutation Strength Adaptation of the $(μ/μ_I, λ)$-ES for Large Population Sizes on the Sphere Function

Authors

TL;DR

Abstract

Table of Contents

Figures (7)