Table of Contents
Fetching ...

RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning

Hongqiao Lian, Zeyuan Ma, Hongshu Guo, Ting Huang, Yue-Jiao Gong

TL;DR

RLEMMO addresses multimodal optimization under limited evaluations by introducing a generalizable MetaBBO framework. A meta-level reinforcement learning agent, trained with PPO, flexibly assigns per-individual search strategies during lower-level evolutionary optimization, guided by a landscape-informed state representation and attention-based population sharing. A clustering-based reward encourages both solution quality and diversity, enabling effective meta-training on MMOP families and generalization to unseen problems. On the CEC2013 MMOP benchmark, RLEMMO achieves competitive performance against strong baselines and demonstrates robust generalization, supported by ablation studies that highlight the importance of state features, action diversity, and the clustering reward.

Abstract

Solving multimodal optimization problems (MMOP) requires finding all optimal solutions, which is challenging in limited function evaluations. Although existing works strike the balance of exploration and exploitation through hand-crafted adaptive strategies, they require certain expert knowledge, hence inflexible to deal with MMOP with different properties. In this paper, we propose RLEMMO, a Meta-Black-Box Optimization framework, which maintains a population of solutions and incorporates a reinforcement learning agent for flexibly adjusting individual-level searching strategies to match the up-to-date optimization status, hence boosting the search performance on MMOP. Concretely, we encode landscape properties and evolution path information into each individual and then leverage attention networks to advance population information sharing. With a novel reward mechanism that encourages both quality and diversity, RLEMMO can be effectively trained using a policy gradient algorithm. The experimental results on the CEC2013 MMOP benchmark underscore the competitive optimization performance of RLEMMO against several strong baselines.

RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning

TL;DR

RLEMMO addresses multimodal optimization under limited evaluations by introducing a generalizable MetaBBO framework. A meta-level reinforcement learning agent, trained with PPO, flexibly assigns per-individual search strategies during lower-level evolutionary optimization, guided by a landscape-informed state representation and attention-based population sharing. A clustering-based reward encourages both solution quality and diversity, enabling effective meta-training on MMOP families and generalization to unseen problems. On the CEC2013 MMOP benchmark, RLEMMO achieves competitive performance against strong baselines and demonstrates robust generalization, supported by ablation studies that highlight the importance of state features, action diversity, and the clustering reward.

Abstract

Solving multimodal optimization problems (MMOP) requires finding all optimal solutions, which is challenging in limited function evaluations. Although existing works strike the balance of exploration and exploitation through hand-crafted adaptive strategies, they require certain expert knowledge, hence inflexible to deal with MMOP with different properties. In this paper, we propose RLEMMO, a Meta-Black-Box Optimization framework, which maintains a population of solutions and incorporates a reinforcement learning agent for flexibly adjusting individual-level searching strategies to match the up-to-date optimization status, hence boosting the search performance on MMOP. Concretely, we encode landscape properties and evolution path information into each individual and then leverage attention networks to advance population information sharing. With a novel reward mechanism that encourages both quality and diversity, RLEMMO can be effectively trained using a policy gradient algorithm. The experimental results on the CEC2013 MMOP benchmark underscore the competitive optimization performance of RLEMMO against several strong baselines.
Paper Structure (29 sections, 10 equations, 3 figures, 4 tables, 1 algorithm)

This paper contains 29 sections, 10 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Blueprint of RLEMMO, where the meta-level RL agent outputs a search strategy for advancing the solution population in the low-level optimization. The meta-level RL agent is meta-trained to maximize the accumulated reward during the low-level optimization.
  • Figure 2: The architecture of neural networks in RLEMMO is depicted, with arrows indicating the overall workflow: at each time step, we input the state representation of the current solution population into the neural networks and sample individual-level strategies to advance the population along the low-level optimization process. And the critic is used to estimate the return value for training the policy network.
  • Figure 3: Ablation studies: The average PR and SR of the ablation experiments on the testing dataset are compared at the accuracy level of $10^{-4}$. The results in sub-figures \ref{['fig:abla-state']} to \ref{['fig:abla-reward']} represent the ablation experiments on state features, action set, and reward mechanism, respectively.