Table of Contents
Fetching ...

Distributed Evolution Strategies with Multi-Level Learning for Large-Scale Black-Box Optimization

Qiqi Duan, Chang Shao, Guochen Zhou, Minghan Zhang, Qi Zhao, Yuhui Shi

TL;DR

This work tackles large-scale black-box optimization by marrying LM-CMA with a multilevel learning-based meta-framework to exploit distributed computation. The outer CMA-ES governs meta-parameters while multiple inner LM-CMA solvers run in parallel, using isolation time to balance learning progress and communication; the framework introduces elitist and multi-recombination updates, spatiotemporal global step-size adaptation, and collective learning of CMA on structured populations. Empirical results on 2000-dimensional, memory-expensive benchmarks show competitive local and global search performance with quantifiable trade-offs between communication overhead and model richness, aided by a Ray-based distributed implementation. The approach offers a scalable path for distributed black-box optimization, with open-source code to support replication and further development.

Abstract

In the post-Moore era, main performance gains of black-box optimizers are increasingly depending on parallelism, especially for large-scale optimization (LSO). Here we propose to parallelize the well-established covariance matrix adaptation evolution strategy (CMA-ES) and in particular its one latest LSO variant called limited-memory CMA-ES (LM-CMA). To achieve efficiency while approximating its powerful invariance property, we present a multilevel learning-based meta-framework for distributed LM-CMA. Owing to its hierarchically organized structure, Meta-ES is well-suited to implement our distributed meta-framework, wherein the outer-ES controls strategy parameters while all parallel inner-ESs run the serial LM-CMA with different settings. For the distribution mean update of the outer-ES, both the elitist and multi-recombination strategy are used in parallel to avoid stagnation and regression, respectively. To exploit spatiotemporal information, the global step-size adaptation combines Meta-ES with the parallel cumulative step-size adaptation. After each isolation time, our meta-framework employs both the structure and parameter learning strategy to combine aligned evolution paths for CMA reconstruction. Experiments on a set of large-scale benchmarking functions with memory-intensive evaluations, arguably reflecting many data-driven optimization problems, validate the benefits (e.g., effectiveness w.r.t. solution quality, and adaptability w.r.t. second-order learning) and costs of our meta-framework.

Distributed Evolution Strategies with Multi-Level Learning for Large-Scale Black-Box Optimization

TL;DR

This work tackles large-scale black-box optimization by marrying LM-CMA with a multilevel learning-based meta-framework to exploit distributed computation. The outer CMA-ES governs meta-parameters while multiple inner LM-CMA solvers run in parallel, using isolation time to balance learning progress and communication; the framework introduces elitist and multi-recombination updates, spatiotemporal global step-size adaptation, and collective learning of CMA on structured populations. Empirical results on 2000-dimensional, memory-expensive benchmarks show competitive local and global search performance with quantifiable trade-offs between communication overhead and model richness, aided by a Ray-based distributed implementation. The approach offers a scalable path for distributed black-box optimization, with open-source code to support replication and further development.

Abstract

In the post-Moore era, main performance gains of black-box optimizers are increasingly depending on parallelism, especially for large-scale optimization (LSO). Here we propose to parallelize the well-established covariance matrix adaptation evolution strategy (CMA-ES) and in particular its one latest LSO variant called limited-memory CMA-ES (LM-CMA). To achieve efficiency while approximating its powerful invariance property, we present a multilevel learning-based meta-framework for distributed LM-CMA. Owing to its hierarchically organized structure, Meta-ES is well-suited to implement our distributed meta-framework, wherein the outer-ES controls strategy parameters while all parallel inner-ESs run the serial LM-CMA with different settings. For the distribution mean update of the outer-ES, both the elitist and multi-recombination strategy are used in parallel to avoid stagnation and regression, respectively. To exploit spatiotemporal information, the global step-size adaptation combines Meta-ES with the parallel cumulative step-size adaptation. After each isolation time, our meta-framework employs both the structure and parameter learning strategy to combine aligned evolution paths for CMA reconstruction. Experiments on a set of large-scale benchmarking functions with memory-intensive evaluations, arguably reflecting many data-driven optimization problems, validate the benefits (e.g., effectiveness w.r.t. solution quality, and adaptability w.r.t. second-order learning) and costs of our meta-framework.
Paper Structure (17 sections, 5 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 17 sections, 5 equations, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: The flowchart diagram of our proposed approach (\ref{['subsec:meta_framework_des']}) consisting of four components: \ref{['subsec:hierarchical_organization']}) hierarchical organization of LM-CMA via Meta-ES, \ref{['subsec:update_outer_es_mean']}) distribution mean update at the outer-ES level, \ref{['subsec:adapt_global_step_size']}) spatiotemporal global step-size adaptation, and \ref{['subsec:collective_learning_cma']}) collective learning of CMA reconstruction on structured populations.
  • Figure 2: Median convergence curves on a set of 2000-d unimodal functions given the maximal runtime (3 hours) and the cost threshold ($1e^{-10}$).
  • Figure 3: Median convergence curves on a set of 2000-d unimodal functions given the maximal runtime (3 hours) and the cost threshold ($1e^{-10}$).
  • Figure 4: Median convergence curves on a set of 2000-d multimodal functions given the maximal runtime (3 hours) and the cost threshold ($1e^{-10}$).
  • Figure 5: Median convergence curves on a set of 2000-d multimodal functions given the maximal runtime (3 hours) and the cost threshold ($1e^{-10}$).
  • ...and 3 more figures