Table of Contents
Fetching ...

Sampling in CMA-ES: Low Numbers of Low Discrepancy Points

Jacob de Nobel, Diederick Vermetten, Thomas H. W. Bäck, Anna V. Kononova

TL;DR

This work establishes a clear relation between the $L_2$ discrepancy of the used point set and the empirical performance of the CMA-ES, and shows that iterating through small, fixed sets of low-discrepancy points can still perform better than the default uniform distribution.

Abstract

The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is one of the most successful examples of a derandomized evolution strategy. However, it still relies on randomly sampling offspring, which can be done via a uniform distribution and subsequently transforming into the required Gaussian. Previous work has shown that replacing this uniform sampling with a low-discrepancy sampler, such as Halton or Sobol sequences, can improve performance over a wide set of problems. We show that iterating through small, fixed sets of low-discrepancy points can still perform better than the default uniform distribution. Moreover, using only 128 points throughout the search is sufficient to closely approximate the empirical performance of using the complete pseudorandom sequence up to dimensionality 40 on the BBOB benchmark. For lower dimensionalities (below 10), we find that using as little as 32 unique low discrepancy points performs similar or better than uniform sampling. In 2D, for which we have highly optimized low discrepancy samples available, we demonstrate that using these points yields the highest empirical performance and requires only 16 samples to improve over uniform sampling. Overall, we establish a clear relation between the $L_2$ discrepancy of the used point set and the empirical performance of the CMA-ES.

Sampling in CMA-ES: Low Numbers of Low Discrepancy Points

TL;DR

This work establishes a clear relation between the discrepancy of the used point set and the empirical performance of the CMA-ES, and shows that iterating through small, fixed sets of low-discrepancy points can still perform better than the default uniform distribution.

Abstract

The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is one of the most successful examples of a derandomized evolution strategy. However, it still relies on randomly sampling offspring, which can be done via a uniform distribution and subsequently transforming into the required Gaussian. Previous work has shown that replacing this uniform sampling with a low-discrepancy sampler, such as Halton or Sobol sequences, can improve performance over a wide set of problems. We show that iterating through small, fixed sets of low-discrepancy points can still perform better than the default uniform distribution. Moreover, using only 128 points throughout the search is sufficient to closely approximate the empirical performance of using the complete pseudorandom sequence up to dimensionality 40 on the BBOB benchmark. For lower dimensionalities (below 10), we find that using as little as 32 unique low discrepancy points performs similar or better than uniform sampling. In 2D, for which we have highly optimized low discrepancy samples available, we demonstrate that using these points yields the highest empirical performance and requires only 16 samples to improve over uniform sampling. Overall, we establish a clear relation between the discrepancy of the used point set and the empirical performance of the CMA-ES.
Paper Structure (10 sections, 3 equations, 6 figures)

This paper contains 10 sections, 3 equations, 6 figures.

Figures (6)

  • Figure 1: (Average) $\log_{10}(d^*_2)$ star discrepancy for the generated fixed-size point sets across all dimensionalities. Colors are (min-max) normalized on a per-dimensionality basis; darker colors indicate a worse (higher) $d_2^*$ value.
  • Figure 2: Average area under the EAF curve for each sampling method on the BBOB benchmark, grouped by dimension. Colors are (min-max) normalized on a per-dimensionality basis; darker colors indicate a worse (lower) EAF value.
  • Figure 3: Empirical Attainment Function aggregated over all 24 BBOB functions for dimensionality 2. The methods are shown in color for each sampling strategy, with cache size indicating the number of points in the cache and $\infty$ indicating no caching; every sample is unique. From left to right, the subfigures show results using a uniform sampler, a Sobol sequence, a scrambled Halton sequence, and the optimized point sets. Note that in the three rightmost figures, the default CMA-ES sampling strategy, UNIFORM-$\infty$, is included for comparison.
  • Figure 4: Average area under the Empirical Attainment Function over all BBOB functions, grouped by dimension vs. the $L_2$ star discrepancy, normalized by dimensionality and the number of points. Lines indicate a linear (least-squares) model for each dimensionality.
  • Figure 5: Empirical Attainment Function aggregated over all 24 BBOB functions for dimensionality 2 (left) and 5 (right), zoomed to final fraction reached. The default sampling strategy of the CMA-ES, UNIFORM-$\infty$, is shown in comparison to using a cached sampling strategy, which uses the 'OPT' samples for a cache size $k \in \{16, 32, 64, 128\}$. The solid lines represent $\lambda = 15$ and the dashed lines $\lambda = 16$.
  • ...and 1 more figures