Enhancing MAP-Elites with Multiple Parallel Evolution Strategies
Manon Flageat, Bryan Lim, Antoine Cully
TL;DR
MEMES tackles the challenge of efficiently leveraging massive parallel evaluations for Quality-Diversity by deploying up to ~100 parallel Evolution Strategy emitters on a single GPU. It introduces dynamic per-emitter resets and a FIFO novelty archive to balance exploration and exploitation, achieving higher archive quality and reproducibility across deterministic and uncertain tasks. Compared to a broad set of baselines, MEMES delivers superior QD-Score and Coverage (and competitive Max-Fitness) on Arm, Hexapod, Ant, and AntTrap, highlighting its robustness to noise and deception. The approach demonstrates that large-scale ES-based QD can be practical on common hardware, enabling scalable, reproducible, and diverse solution archives with potential extensions to additional objectives beyond task fitness and novelty.
Abstract
With the development of fast and massively parallel evaluations in many domains, Quality-Diversity (QD) algorithms, that already proved promising in a large range of applications, have seen their potential multiplied. However, we have yet to understand how to best use a large number of evaluations as using them for random variations alone is not always effective. High-dimensional search spaces are a typical situation where random variations struggle to effectively search. Another situation is uncertain settings where solutions can appear better than they truly are and naively evaluating more solutions might mislead QD algorithms. In this work, we propose MAP-Elites-Multi-ES (MEMES), a novel QD algorithm based on Evolution Strategies (ES) designed to exploit fast parallel evaluations more effectively. MEMES maintains multiple (up to 100) simultaneous ES processes, each with its own independent objective and reset mechanism designed for QD optimisation, all on just a single GPU. We show that MEMES outperforms both gradient-based and mutation-based QD algorithms on black-box optimisation and QD-Reinforcement-Learning tasks, demonstrating its benefit across domains. Additionally, our approach outperforms sampling-based QD methods in uncertain domains when given the same evaluation budget. Overall, MEMES generates reproducible solutions that are high-performing and diverse through large-scale ES optimisation on easily accessible hardware.
