Auto-configuring Exploration-Exploitation Tradeoff in Evolutionary Computation via Deep Reinforcement Learning

Zeyuan Ma; Jiacheng Chen; Hongshu Guo; Yining Ma; Yue-Jiao Gong

Auto-configuring Exploration-Exploitation Tradeoff in Evolutionary Computation via Deep Reinforcement Learning

Zeyuan Ma, Jiacheng Chen, Hongshu Guo, Yining Ma, Yue-Jiao Gong

TL;DR

This paper tackles the challenge of automatically balancing exploration and exploitation in Evolutionary Computation (EC) by introducing GLEET, a deep reinforcement learning framework with a Transformer-inspired attention network. By formulating dynamic EET tuning as a Markov Decision Process and training via PPO on a class of problems, GLEET learns policies that generalize across problem classes, dimensions, and population sizes, improving backbone EC algorithms such as PSO and DE on augmented CEC2021 benchmarks. Key contributions include a rich per-individual state representation, a fully informed encoder with global population awareness, an EET-focused decoder, and demonstrated zero-shot generalization to protein-docking tasks, along with interpretable insights into learned EET strategies. The approach promises broad applicability to existing EC methods and provides a principled, data-driven pathway to adapting EET in black-box optimization contexts.

Abstract

Evolutionary computation (EC) algorithms, renowned as powerful black-box optimizers, leverage a group of individuals to cooperatively search for the optimum. The exploration-exploitation tradeoff (EET) plays a crucial role in EC, which, however, has traditionally been governed by manually designed rules. In this paper, we propose a deep reinforcement learning-based framework that autonomously configures and adapts the EET throughout the EC search process. The framework allows different individuals of the population to selectively attend to the global and local exemplars based on the current search state, maximizing the cooperative search outcome. Our proposed framework is characterized by its simplicity, effectiveness, and generalizability, with the potential to enhance numerous existing EC algorithms. To validate its capabilities, we apply our framework to several representative EC algorithms and conduct extensive experiments on the augmented CEC2021 benchmark. The results demonstrate significant improvements in the performance of the backbone algorithms, as well as favorable generalization across diverse problem classes, dimensions, and population sizes. Additionally, we provide an in-depth analysis of the EET issue by interpreting the learned behaviors of EC.

Auto-configuring Exploration-Exploitation Tradeoff in Evolutionary Computation via Deep Reinforcement Learning

TL;DR

Abstract

Paper Structure (40 sections, 16 equations, 10 figures, 9 tables)

This paper contains 40 sections, 16 equations, 10 figures, 9 tables.

Introduction
Related Works
Traditional EET Methods
Learning-based EET Methods
Preliminary and Notations
Deep Reinforcement Learning
Attention Mechanism
Particle Swarm Optimization
Methodology of GLEET
MDP Formulation
State
Action
Reward
Network Design
Feature embedding
...and 25 more sections

Figures (10)

Figure 1: The overview of GLEET as an MDP.
Figure 2: Illustration of our network. The network firstly embeds the state feature into two components: the EET embedding and the population embedding. Next, a Fully Informed Encoder is employed to attend the population embedding to the individual level. Finally, the individual's EET configuration is determined by decoding the information from the EET embedding using the Exploration-Exploitation Decoder.
Figure 3: Visualization of action distribution changes as the optimization process advances.
Figure 4: Visualization of the attention patterns and the moving of particles during exploration and exploitation controlled by GLEET. In Exploration case, GLEET leans to make the particle as far as possible from the most attended neighbour to get max exploration ability. In Exploitation case, GLEET leans to make the particle as close as possible to the most attended neighbour to reach the global optimum.
Figure 5: Generalization across different problem dimensions.
...and 5 more figures

Auto-configuring Exploration-Exploitation Tradeoff in Evolutionary Computation via Deep Reinforcement Learning

TL;DR

Abstract

Auto-configuring Exploration-Exploitation Tradeoff in Evolutionary Computation via Deep Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (10)