RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning
Hongqiao Lian, Zeyuan Ma, Hongshu Guo, Ting Huang, Yue-Jiao Gong
TL;DR
RLEMMO addresses multimodal optimization under limited evaluations by introducing a generalizable MetaBBO framework. A meta-level reinforcement learning agent, trained with PPO, flexibly assigns per-individual search strategies during lower-level evolutionary optimization, guided by a landscape-informed state representation and attention-based population sharing. A clustering-based reward encourages both solution quality and diversity, enabling effective meta-training on MMOP families and generalization to unseen problems. On the CEC2013 MMOP benchmark, RLEMMO achieves competitive performance against strong baselines and demonstrates robust generalization, supported by ablation studies that highlight the importance of state features, action diversity, and the clustering reward.
Abstract
Solving multimodal optimization problems (MMOP) requires finding all optimal solutions, which is challenging in limited function evaluations. Although existing works strike the balance of exploration and exploitation through hand-crafted adaptive strategies, they require certain expert knowledge, hence inflexible to deal with MMOP with different properties. In this paper, we propose RLEMMO, a Meta-Black-Box Optimization framework, which maintains a population of solutions and incorporates a reinforcement learning agent for flexibly adjusting individual-level searching strategies to match the up-to-date optimization status, hence boosting the search performance on MMOP. Concretely, we encode landscape properties and evolution path information into each individual and then leverage attention networks to advance population information sharing. With a novel reward mechanism that encourages both quality and diversity, RLEMMO can be effectively trained using a policy gradient algorithm. The experimental results on the CEC2013 MMOP benchmark underscore the competitive optimization performance of RLEMMO against several strong baselines.
