CMA-ES with Learning Rate Adaptation
Masahiro Nomura, Youhei Akimoto, Isao Ono
TL;DR
This work addresses CMA-ES sensitivity to the learning rate by introducing Learning Rate Adaptation (LRA) that preserves a target signal-to-noise ratio (SNR) in the CMA-ES updates. Grounded in IGO theory and an ODE perspective, it shows that small learning rates can align updates with gradient-like trajectories for difficult multimodal and noisy tasks, and derives an approximate optimal rate tied to SNR. The proposed LRA mechanism estimates SNR in a local coordinate system, adaptively tunes learning-rate factors for the mean and covariance updates, and includes a step-size correction and covariance decomposition to maintain stability. Empirical results on a suite of unimodal, multimodal, and noisy benchmarks indicate that LRA-CMA-ES achieves robust performance without expensive tuning and often outperforms PSA-CMA-ES on noisy problems, highlighting its practical value for black-box continuous optimization.
Abstract
The covariance matrix adaptation evolution strategy (CMA-ES) is one of the most successful methods for solving continuous black-box optimization problems. A practically useful aspect of the CMA-ES is that it can be used without hyperparameter tuning. However, the hyperparameter settings still have a considerable impact on performance, especially for difficult tasks, such as solving multimodal or noisy problems. This study comprehensively explores the impact of learning rate on the CMA-ES performance and demonstrates the necessity of a small learning rate by considering ordinary differential equations. Thereafter, it discusses the setting of an ideal learning rate. Based on these discussions, we develop a novel learning rate adaptation mechanism for the CMA-ES that maintains a constant signal-to-noise ratio. Additionally, we investigate the behavior of the CMA-ES with the proposed learning rate adaptation mechanism through numerical experiments, and compare the results with those obtained for the CMA-ES with a fixed learning rate and with population size adaptation. The results show that the CMA-ES with the proposed learning rate adaptation works well for multimodal and/or noisy problems without extremely expensive learning rate tuning.
