Natural Gradient Interpretation of Rank-One Update in CMA-ES

Ryoki Hamano; Shinichi Shirakawa; Masahiro Nomura

Natural Gradient Interpretation of Rank-One Update in CMA-ES

Ryoki Hamano, Shinichi Shirakawa, Masahiro Nomura

TL;DR

This paper links CMA-ES to information geometry by interpreting the rank-one covariance update as a natural-gradient step with a prior, via a newly proposed MAP-IGO framework. By choosing a normal-inverse-Wishart prior that encodes the evolution-path direction as the promising mean, the authors derive a rank-one update that includes an additional momentum-like term in the mean update, yielding MAP-CMA. Empirical results show MAP-CMA can outperform CMA-ES on problems requiring large mean-vector moves (e.g., Rosenbrock), while recovering CMA-ES behavior as the prior becomes less informative (large $r$). The work broadens the theoretical basis of CMA-ES components and suggests avenues for priors to augment probabilistic search methods.

Abstract

The covariance matrix adaptation evolution strategy (CMA-ES) is a stochastic search algorithm using a multivariate normal distribution for continuous black-box optimization. In addition to strong empirical results, part of the CMA-ES can be described by a stochastic natural gradient method and can be derived from information geometric optimization (IGO) framework. However, there are some components of the CMA-ES, such as the rank-one update, for which the theoretical understanding is limited. While the rank-one update makes the covariance matrix to increase the likelihood of generating a solution in the direction of the evolution path, this idea has been difficult to formulate and interpret as a natural gradient method unlike the rank-$μ$ update. In this work, we provide a new interpretation of the rank-one update in the CMA-ES from the perspective of the natural gradient with prior distribution. First, we propose maximum a posteriori IGO (MAP-IGO), which is the IGO framework extended to incorporate a prior distribution. Then, we derive the rank-one update from the MAP-IGO by setting the prior distribution based on the idea that the promising mean vector should exist in the direction of the evolution path. Moreover, the newly derived rank-one update is extensible, where an additional term appears in the update for the mean vector. We empirically investigate the properties of the additional term using various benchmark functions.

Natural Gradient Interpretation of Rank-One Update in CMA-ES

TL;DR

). The work broadens the theoretical basis of CMA-ES components and suggests avenues for priors to augment probabilistic search methods.

Abstract

update. In this work, we provide a new interpretation of the rank-one update in the CMA-ES from the perspective of the natural gradient with prior distribution. First, we propose maximum a posteriori IGO (MAP-IGO), which is the IGO framework extended to incorporate a prior distribution. Then, we derive the rank-one update from the MAP-IGO by setting the prior distribution based on the idea that the promising mean vector should exist in the direction of the evolution path. Moreover, the newly derived rank-one update is extensible, where an additional term appears in the update for the mean vector. We empirically investigate the properties of the additional term using various benchmark functions.

Paper Structure (18 sections, 27 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 18 sections, 27 equations, 3 figures, 2 tables, 1 algorithm.

Introduction
Preliminaries
CMA-ES
Information Geometric Optimization
Maximum a Posteriori IGO
Introducing Prior Information to IGO
Equivalence of IGO objective to ML estimation
MAP estimation instead of ML estimation
Natural Gradient Update for MAP-IGO
Natural Gradient for Normal-Inverse-Wishart Distribution
Update Rules for MAP-IGO with Multivariate Normal Distribution
Interpretation of the Rank-one Update with Prior Distribution
Derivation of the Rank-one Update
Interpretation for the Setting of the Prior Distribution
Experiments
...and 3 more sections

Figures (3)

Figure 1: The prior distribution with respect to the mean vector $\mathcal{N} \left( \boldsymbol{m} \mid \boldsymbol{\delta}, \frac{1}{\gamma} \boldsymbol{C} \right)$, where $\boldsymbol{\delta}$ and $\frac{1}{\gamma} \boldsymbol{C}$ are indicated by the orange star and ellipse, respectively. Since $\boldsymbol{\delta} = \boldsymbol{m}^{(t)} + r \sigma^{(t)} \boldsymbol{p}^{(t+1)}_c$ and $\frac{1}{\gamma} \propto r^2$, multiplying $r$ by a constant $r'>1$ corresponds to expanding $\mathcal{N} \left( \boldsymbol{m} \mid \boldsymbol{\delta}, \frac{1}{\gamma} \boldsymbol{C} \right)$ by $r'$ times around $\boldsymbol{m}^{(t)}$.
Figure 2: Transitions of best evaluation value for $N=20$ over 100 independent trials.
Figure 3: Transition of mean vector in one typical trial of optimizing Rosenbrock with $N=20$.

Natural Gradient Interpretation of Rank-One Update in CMA-ES

TL;DR

Abstract

Natural Gradient Interpretation of Rank-One Update in CMA-ES

Authors

TL;DR

Abstract

Table of Contents

Figures (3)