CMA-ES with Learning Rate Adaptation

Masahiro Nomura; Youhei Akimoto; Isao Ono

CMA-ES with Learning Rate Adaptation

Masahiro Nomura, Youhei Akimoto, Isao Ono

TL;DR

This work addresses CMA-ES sensitivity to the learning rate by introducing Learning Rate Adaptation (LRA) that preserves a target signal-to-noise ratio (SNR) in the CMA-ES updates. Grounded in IGO theory and an ODE perspective, it shows that small learning rates can align updates with gradient-like trajectories for difficult multimodal and noisy tasks, and derives an approximate optimal rate tied to SNR. The proposed LRA mechanism estimates SNR in a local coordinate system, adaptively tunes learning-rate factors for the mean and covariance updates, and includes a step-size correction and covariance decomposition to maintain stability. Empirical results on a suite of unimodal, multimodal, and noisy benchmarks indicate that LRA-CMA-ES achieves robust performance without expensive tuning and often outperforms PSA-CMA-ES on noisy problems, highlighting its practical value for black-box continuous optimization.

Abstract

The covariance matrix adaptation evolution strategy (CMA-ES) is one of the most successful methods for solving continuous black-box optimization problems. A practically useful aspect of the CMA-ES is that it can be used without hyperparameter tuning. However, the hyperparameter settings still have a considerable impact on performance, especially for difficult tasks, such as solving multimodal or noisy problems. This study comprehensively explores the impact of learning rate on the CMA-ES performance and demonstrates the necessity of a small learning rate by considering ordinary differential equations. Thereafter, it discusses the setting of an ideal learning rate. Based on these discussions, we develop a novel learning rate adaptation mechanism for the CMA-ES that maintains a constant signal-to-noise ratio. Additionally, we investigate the behavior of the CMA-ES with the proposed learning rate adaptation mechanism through numerical experiments, and compare the results with those obtained for the CMA-ES with a fixed learning rate and with population size adaptation. The results show that the CMA-ES with the proposed learning rate adaptation works well for multimodal and/or noisy problems without extremely expensive learning rate tuning.

CMA-ES with Learning Rate Adaptation

TL;DR

Abstract

Paper Structure (31 sections, 50 equations, 19 figures, 1 table, 1 algorithm)

This paper contains 31 sections, 50 equations, 19 figures, 1 table, 1 algorithm.

Introduction
Background
CMA-ES
IGO
Learning Rate Impact
Relation Between Population Size and Learning Rate
Effect of Decreasing the Learning Rate from an ODE Perspective
Optimal Learning Rate
LRA Mechanism
Main Concept
SNR Estimation
Learning Rate Factor Adaptation
Local Coordinate-System Definition
Covariance Matrix Decomposition
Step-size Correction
...and 16 more sections

Figures (19)

Figure 1: Rastrigin function.
Figure 2:
Figure 4: Typical LRA-CMA-ES behaviors for 10-dimensional (10-D) noiseless problems. The coordinates of $m$ and the square roots of the eigenvalues of $\sigma^2 C$ (denoted by $\sqrt{\text{eig}}$) are indicated with different colors.
Figure 5: Typical LRA-CMA-ES behaviors for 10-D noisy problems. The noise variance $\sigma_n^2$ was set to $1$.
Figure 6: Success rates according to the number of dimensions (noiseless problems).
...and 14 more figures

CMA-ES with Learning Rate Adaptation

TL;DR

Abstract

CMA-ES with Learning Rate Adaptation

Authors

TL;DR

Abstract

Table of Contents

Figures (19)