Evolutionary Reinforcement Learning via Cooperative Coevolution
Chengpeng Hu, Jialin Liu, Xin Yao
TL;DR
This work tackles the scalability of Evolutionary Reinforcement Learning (ERL) in high-dimensional neural networks by introducing CoERL, a cooperative coevolutionary framework that decomposes policy optimization into multiple subproblems and guides updates with partial gradients. It combines a CC loop for subproblem evolution with an off-policy RL loop (SAC) to exploit temporal information, achieving an overall $\mathcal{O}(\lvert\theta\rvert)$ update cost per iteration and improved sample efficiency. Key findings include competitive results across six MuJoCo locomotion tasks, clear ablations showing the value of each core component, and insights into behaviour-space inheritance and coordination strategies. The framework offers a scalable, data-efficient pathway for large-scale ERL, with potential extensions in knowledge-based grouping and explainable decomposition for neural networks.
Abstract
Recently, evolutionary reinforcement learning has obtained much attention in various domains. Maintaining a population of actors, evolutionary reinforcement learning utilises the collected experiences to improve the behaviour policy through efficient exploration. However, the poor scalability of genetic operators limits the efficiency of optimising high-dimensional neural networks.To address this issue, this paper proposes a novel cooperative coevolutionary reinforcement learning (CoERL) algorithm. Inspired by cooperative coevolution, CoERL periodically and adaptively decomposes the policy optimisation problem into multiple subproblems and evolves a population of neural networks for each of the subproblems. Instead of using genetic operators, CoERL directly searches for partial gradients to update the policy. Updating policy with partial gradients maintains consistency between the behaviour spaces of parents and offspring across generations.The experiences collected by the population are then used to improve the entire policy, which enhances the sampling efficiency.Experiments on six benchmark locomotion tasks demonstrate that CoERL outperforms seven state-of-the-art algorithms and baselines.Ablation study verifies the unique contribution of CoERL's core ingredients.
