Table of Contents
Fetching ...

CMA-ES with Learning Rate Adaptation

Masahiro Nomura, Youhei Akimoto, Isao Ono

TL;DR

This work addresses CMA-ES sensitivity to the learning rate by introducing Learning Rate Adaptation (LRA) that preserves a target signal-to-noise ratio (SNR) in the CMA-ES updates. Grounded in IGO theory and an ODE perspective, it shows that small learning rates can align updates with gradient-like trajectories for difficult multimodal and noisy tasks, and derives an approximate optimal rate tied to SNR. The proposed LRA mechanism estimates SNR in a local coordinate system, adaptively tunes learning-rate factors for the mean and covariance updates, and includes a step-size correction and covariance decomposition to maintain stability. Empirical results on a suite of unimodal, multimodal, and noisy benchmarks indicate that LRA-CMA-ES achieves robust performance without expensive tuning and often outperforms PSA-CMA-ES on noisy problems, highlighting its practical value for black-box continuous optimization.

Abstract

The covariance matrix adaptation evolution strategy (CMA-ES) is one of the most successful methods for solving continuous black-box optimization problems. A practically useful aspect of the CMA-ES is that it can be used without hyperparameter tuning. However, the hyperparameter settings still have a considerable impact on performance, especially for difficult tasks, such as solving multimodal or noisy problems. This study comprehensively explores the impact of learning rate on the CMA-ES performance and demonstrates the necessity of a small learning rate by considering ordinary differential equations. Thereafter, it discusses the setting of an ideal learning rate. Based on these discussions, we develop a novel learning rate adaptation mechanism for the CMA-ES that maintains a constant signal-to-noise ratio. Additionally, we investigate the behavior of the CMA-ES with the proposed learning rate adaptation mechanism through numerical experiments, and compare the results with those obtained for the CMA-ES with a fixed learning rate and with population size adaptation. The results show that the CMA-ES with the proposed learning rate adaptation works well for multimodal and/or noisy problems without extremely expensive learning rate tuning.

CMA-ES with Learning Rate Adaptation

TL;DR

This work addresses CMA-ES sensitivity to the learning rate by introducing Learning Rate Adaptation (LRA) that preserves a target signal-to-noise ratio (SNR) in the CMA-ES updates. Grounded in IGO theory and an ODE perspective, it shows that small learning rates can align updates with gradient-like trajectories for difficult multimodal and noisy tasks, and derives an approximate optimal rate tied to SNR. The proposed LRA mechanism estimates SNR in a local coordinate system, adaptively tunes learning-rate factors for the mean and covariance updates, and includes a step-size correction and covariance decomposition to maintain stability. Empirical results on a suite of unimodal, multimodal, and noisy benchmarks indicate that LRA-CMA-ES achieves robust performance without expensive tuning and often outperforms PSA-CMA-ES on noisy problems, highlighting its practical value for black-box continuous optimization.

Abstract

The covariance matrix adaptation evolution strategy (CMA-ES) is one of the most successful methods for solving continuous black-box optimization problems. A practically useful aspect of the CMA-ES is that it can be used without hyperparameter tuning. However, the hyperparameter settings still have a considerable impact on performance, especially for difficult tasks, such as solving multimodal or noisy problems. This study comprehensively explores the impact of learning rate on the CMA-ES performance and demonstrates the necessity of a small learning rate by considering ordinary differential equations. Thereafter, it discusses the setting of an ideal learning rate. Based on these discussions, we develop a novel learning rate adaptation mechanism for the CMA-ES that maintains a constant signal-to-noise ratio. Additionally, we investigate the behavior of the CMA-ES with the proposed learning rate adaptation mechanism through numerical experiments, and compare the results with those obtained for the CMA-ES with a fixed learning rate and with population size adaptation. The results show that the CMA-ES with the proposed learning rate adaptation works well for multimodal and/or noisy problems without extremely expensive learning rate tuning.
Paper Structure (31 sections, 50 equations, 19 figures, 1 table, 1 algorithm)

This paper contains 31 sections, 50 equations, 19 figures, 1 table, 1 algorithm.

Figures (19)

  • Figure 1: Rastrigin function.
  • Figure 2:
  • Figure 4: Typical LRA-CMA-ES behaviors for 10-dimensional (10-D) noiseless problems. The coordinates of $m$ and the square roots of the eigenvalues of $\sigma^2 C$ (denoted by $\sqrt{\text{eig}}$) are indicated with different colors.
  • Figure 5: Typical LRA-CMA-ES behaviors for 10-D noisy problems. The noise variance $\sigma_n^2$ was set to $1$.
  • Figure 6: Success rates according to the number of dimensions (noiseless problems).
  • ...and 14 more figures