Online Optimization of Curriculum Learning Schedules using Evolutionary Optimization
Mohit Jiwatode, Leon Schlecht, Alexander Dockhorn
TL;DR
This work tackles the challenge of manually designing curricula for reinforcement learning by proposing RHEA CL, which jointly uses Curriculum Learning and Rolling Horizon Evolutionary Algorithms to optimize curricula online during PPO-based training. The approach maintains a population of candidate curricula, evaluates them across curriculum steps, and selects the best to continue in the next epoch, effectively adapting task difficulty to the agent’s learning progress. Empirical results on Minigrid tasks DoorKey and DynamicObstacles show that RHEA CL yields faster early improvements and competitive final performance, at the cost of additional curriculum evaluations during training. The paper also analyzes hyperparameter effects (e.g., $nGen$, $curricLength$, $curricCount$) and compares against baselines such as RHRS, SPCL, AllParallel, and vanilla PPO, highlighting the potential of automated curriculum optimization to enhance learning speed and robustness in dynamic environments.
Abstract
We propose RHEA CL, which combines Curriculum Learning (CL) with Rolling Horizon Evolutionary Algorithms (RHEA) to automatically produce effective curricula during the training of a reinforcement learning agent. RHEA CL optimizes a population of curricula, using an evolutionary algorithm, and selects the best-performing curriculum as the starting point for the next training epoch. Performance evaluations are conducted after every curriculum step in all environments. We evaluate the algorithm on the \textit{DoorKey} and \textit{DynamicObstacles} environments within the Minigrid framework. It demonstrates adaptability and consistent improvement, particularly in the early stages, while reaching a stable performance later that is capable of outperforming other curriculum learners. In comparison to other curriculum schedules, RHEA CL has been shown to yield performance improvements for the final Reinforcement learning (RL) agent at the cost of additional evaluation during training.
