Auto-configuring Exploration-Exploitation Tradeoff in Evolutionary Computation via Deep Reinforcement Learning
Zeyuan Ma, Jiacheng Chen, Hongshu Guo, Yining Ma, Yue-Jiao Gong
TL;DR
This paper tackles the challenge of automatically balancing exploration and exploitation in Evolutionary Computation (EC) by introducing GLEET, a deep reinforcement learning framework with a Transformer-inspired attention network. By formulating dynamic EET tuning as a Markov Decision Process and training via PPO on a class of problems, GLEET learns policies that generalize across problem classes, dimensions, and population sizes, improving backbone EC algorithms such as PSO and DE on augmented CEC2021 benchmarks. Key contributions include a rich per-individual state representation, a fully informed encoder with global population awareness, an EET-focused decoder, and demonstrated zero-shot generalization to protein-docking tasks, along with interpretable insights into learned EET strategies. The approach promises broad applicability to existing EC methods and provides a principled, data-driven pathway to adapting EET in black-box optimization contexts.
Abstract
Evolutionary computation (EC) algorithms, renowned as powerful black-box optimizers, leverage a group of individuals to cooperatively search for the optimum. The exploration-exploitation tradeoff (EET) plays a crucial role in EC, which, however, has traditionally been governed by manually designed rules. In this paper, we propose a deep reinforcement learning-based framework that autonomously configures and adapts the EET throughout the EC search process. The framework allows different individuals of the population to selectively attend to the global and local exemplars based on the current search state, maximizing the cooperative search outcome. Our proposed framework is characterized by its simplicity, effectiveness, and generalizability, with the potential to enhance numerous existing EC algorithms. To validate its capabilities, we apply our framework to several representative EC algorithms and conduct extensive experiments on the augmented CEC2021 benchmark. The results demonstrate significant improvements in the performance of the backbone algorithms, as well as favorable generalization across diverse problem classes, dimensions, and population sizes. Additionally, we provide an in-depth analysis of the EET issue by interpreting the learned behaviors of EC.
