Table of Contents
Fetching ...

Enhancing MAP-Elites with Multiple Parallel Evolution Strategies

Manon Flageat, Bryan Lim, Antoine Cully

TL;DR

MEMES tackles the challenge of efficiently leveraging massive parallel evaluations for Quality-Diversity by deploying up to ~100 parallel Evolution Strategy emitters on a single GPU. It introduces dynamic per-emitter resets and a FIFO novelty archive to balance exploration and exploitation, achieving higher archive quality and reproducibility across deterministic and uncertain tasks. Compared to a broad set of baselines, MEMES delivers superior QD-Score and Coverage (and competitive Max-Fitness) on Arm, Hexapod, Ant, and AntTrap, highlighting its robustness to noise and deception. The approach demonstrates that large-scale ES-based QD can be practical on common hardware, enabling scalable, reproducible, and diverse solution archives with potential extensions to additional objectives beyond task fitness and novelty.

Abstract

With the development of fast and massively parallel evaluations in many domains, Quality-Diversity (QD) algorithms, that already proved promising in a large range of applications, have seen their potential multiplied. However, we have yet to understand how to best use a large number of evaluations as using them for random variations alone is not always effective. High-dimensional search spaces are a typical situation where random variations struggle to effectively search. Another situation is uncertain settings where solutions can appear better than they truly are and naively evaluating more solutions might mislead QD algorithms. In this work, we propose MAP-Elites-Multi-ES (MEMES), a novel QD algorithm based on Evolution Strategies (ES) designed to exploit fast parallel evaluations more effectively. MEMES maintains multiple (up to 100) simultaneous ES processes, each with its own independent objective and reset mechanism designed for QD optimisation, all on just a single GPU. We show that MEMES outperforms both gradient-based and mutation-based QD algorithms on black-box optimisation and QD-Reinforcement-Learning tasks, demonstrating its benefit across domains. Additionally, our approach outperforms sampling-based QD methods in uncertain domains when given the same evaluation budget. Overall, MEMES generates reproducible solutions that are high-performing and diverse through large-scale ES optimisation on easily accessible hardware.

Enhancing MAP-Elites with Multiple Parallel Evolution Strategies

TL;DR

MEMES tackles the challenge of efficiently leveraging massive parallel evaluations for Quality-Diversity by deploying up to ~100 parallel Evolution Strategy emitters on a single GPU. It introduces dynamic per-emitter resets and a FIFO novelty archive to balance exploration and exploitation, achieving higher archive quality and reproducibility across deterministic and uncertain tasks. Compared to a broad set of baselines, MEMES delivers superior QD-Score and Coverage (and competitive Max-Fitness) on Arm, Hexapod, Ant, and AntTrap, highlighting its robustness to noise and deception. The approach demonstrates that large-scale ES-based QD can be practical on common hardware, enabling scalable, reproducible, and diverse solution archives with potential extensions to additional objectives beyond task fitness and novelty.

Abstract

With the development of fast and massively parallel evaluations in many domains, Quality-Diversity (QD) algorithms, that already proved promising in a large range of applications, have seen their potential multiplied. However, we have yet to understand how to best use a large number of evaluations as using them for random variations alone is not always effective. High-dimensional search spaces are a typical situation where random variations struggle to effectively search. Another situation is uncertain settings where solutions can appear better than they truly are and naively evaluating more solutions might mislead QD algorithms. In this work, we propose MAP-Elites-Multi-ES (MEMES), a novel QD algorithm based on Evolution Strategies (ES) designed to exploit fast parallel evaluations more effectively. MEMES maintains multiple (up to 100) simultaneous ES processes, each with its own independent objective and reset mechanism designed for QD optimisation, all on just a single GPU. We show that MEMES outperforms both gradient-based and mutation-based QD algorithms on black-box optimisation and QD-Reinforcement-Learning tasks, demonstrating its benefit across domains. Additionally, our approach outperforms sampling-based QD methods in uncertain domains when given the same evaluation budget. Overall, MEMES generates reproducible solutions that are high-performing and diverse through large-scale ES optimisation on easily accessible hardware.
Paper Structure (41 sections, 5 equations, 12 figures, 3 tables, 3 algorithms)

This paper contains 41 sections, 5 equations, 12 figures, 3 tables, 3 algorithms.

Figures (12)

  • Figure 1: Final QD-Score (top), Coverage (middle) and Max-Fitness (bottom). We report the median and CI over $10$ seeds.
  • Figure 2: QD-Score loss that quantifies the ability of algorithms to correctly estimate the performance of solutions in UQD setting. We report the median and CI over $10$ seeds.
  • Figure 3: Parent-offspring feature-distance for the exploit-ES emitter of MEMES, the PG emitter of PGA-ME and the GA emitter of ME. The solid line is the median and the shaded areas are the quartiles over $10$ replications. We display in red the average width of the cells in the each task.
  • Figure 4: Comparison of the final QD-score for different fixed reset values, compared to the adaptive reset mechanism. We report the median and CI over $5$ seeds.
  • Figure 5: Final QD-score for different novelty-computation mechanisms. We report the median and CI over $5$ seeds.
  • ...and 7 more figures