Table of Contents
Fetching ...

Massively parallel CMA-ES with increasing population

David Redon, Pierre Fortin, Bilel Derbel, Miwako Tsuji, Mitsuhisa Sato

TL;DR

This paper shows how BLAS and LAPACK routines can be introduced in linear algebra operations, and proposes two strategies for deploying IPOP‐CMA‐ES efficiently on large‐scale parallel architectures with up to thousands of CPU cores.

Abstract

The Increasing Population Covariance Matrix Adaptation Evolution Strategy (IPOP-CMA-ES) algorithm is a reference stochastic optimizer dedicated to blackbox optimization, where no prior knowledge about the underlying problem structure is available. This paper aims at accelerating IPOP-CMA-ES thanks to high performance computing and parallelism when solving large optimization problems. We first show how BLAS and LAPACK routines can be introduced in linear algebra operations, and we then propose two strategies for deploying IPOP-CMA-ES efficiently on large-scale parallel architectures with thousands of CPU cores. The first parallel strategy processes the multiple searches in the same ordering as the sequential IPOP-CMA-ES, while the second one processes concurrently these multiple searches. These strategies are implemented in MPI+OpenMP and compared on 6144 cores of the supercomputer Fugaku. We manage to obtain substantial speedups (up to several thousand) and even super-linear ones, and we provide an in-depth analysis of our results to understand precisely the superior performance of our second strategy.

Massively parallel CMA-ES with increasing population

TL;DR

This paper shows how BLAS and LAPACK routines can be introduced in linear algebra operations, and proposes two strategies for deploying IPOP‐CMA‐ES efficiently on large‐scale parallel architectures with up to thousands of CPU cores.

Abstract

The Increasing Population Covariance Matrix Adaptation Evolution Strategy (IPOP-CMA-ES) algorithm is a reference stochastic optimizer dedicated to blackbox optimization, where no prior knowledge about the underlying problem structure is available. This paper aims at accelerating IPOP-CMA-ES thanks to high performance computing and parallelism when solving large optimization problems. We first show how BLAS and LAPACK routines can be introduced in linear algebra operations, and we then propose two strategies for deploying IPOP-CMA-ES efficiently on large-scale parallel architectures with thousands of CPU cores. The first parallel strategy processes the multiple searches in the same ordering as the sequential IPOP-CMA-ES, while the second one processes concurrently these multiple searches. These strategies are implemented in MPI+OpenMP and compared on 6144 cores of the supercomputer Fugaku. We manage to obtain substantial speedups (up to several thousand) and even super-linear ones, and we provide an in-depth analysis of our results to understand precisely the superior performance of our second strategy.
Paper Structure (14 sections, 2 equations, 9 figures, 5 tables, 3 algorithms)

This paper contains 14 sections, 2 equations, 9 figures, 5 tables, 3 algorithms.

Figures (9)

  • Figure 1: Convergence example of CMA-ES on a function space. The white dot indicates the function optimum, the red ellipse the normal law, and the red crosses points sampled according to this law.
  • Figure 2: Illustration of the core occupancy of a naive version of IPOP-CMA-ES with successive parallel descents.
  • Figure 3: Illustration of the core occupancy of the K-Replicated strategy.
  • Figure 4: Illustration of the K-Distributed algorithm.
  • Figure 5: (upper-left) Performance gains for the eigendecomposition of the $C$ matrix when using LAPACK over the reference C code (written without LAPACK). (upper-right, resp. lower-left) Performance gains for the adaptation of the $C$ matrix (resp. for the sampling) when using Level 2 or Level 3 BLAS over the reference C code (without BLAS). (lower-right) Performance gains over the reference C code (without BLAS and LAPACK) for all the linear algebra part, with LAPACK for the eigendecomposition and Level 3 BLAS for the $C$ matrix adaptation, when using Level 2 or Level 3 BLAS routines for the sampling. The IPOP columns correspond to a IPOP-CMA-ES execution with successive descents using $K$ from 1 to $2^8$.
  • ...and 4 more figures