Table of Contents
Fetching ...

Online Optimization of Curriculum Learning Schedules using Evolutionary Optimization

Mohit Jiwatode, Leon Schlecht, Alexander Dockhorn

TL;DR

This work tackles the challenge of manually designing curricula for reinforcement learning by proposing RHEA CL, which jointly uses Curriculum Learning and Rolling Horizon Evolutionary Algorithms to optimize curricula online during PPO-based training. The approach maintains a population of candidate curricula, evaluates them across curriculum steps, and selects the best to continue in the next epoch, effectively adapting task difficulty to the agent’s learning progress. Empirical results on Minigrid tasks DoorKey and DynamicObstacles show that RHEA CL yields faster early improvements and competitive final performance, at the cost of additional curriculum evaluations during training. The paper also analyzes hyperparameter effects (e.g., $nGen$, $curricLength$, $curricCount$) and compares against baselines such as RHRS, SPCL, AllParallel, and vanilla PPO, highlighting the potential of automated curriculum optimization to enhance learning speed and robustness in dynamic environments.

Abstract

We propose RHEA CL, which combines Curriculum Learning (CL) with Rolling Horizon Evolutionary Algorithms (RHEA) to automatically produce effective curricula during the training of a reinforcement learning agent. RHEA CL optimizes a population of curricula, using an evolutionary algorithm, and selects the best-performing curriculum as the starting point for the next training epoch. Performance evaluations are conducted after every curriculum step in all environments. We evaluate the algorithm on the \textit{DoorKey} and \textit{DynamicObstacles} environments within the Minigrid framework. It demonstrates adaptability and consistent improvement, particularly in the early stages, while reaching a stable performance later that is capable of outperforming other curriculum learners. In comparison to other curriculum schedules, RHEA CL has been shown to yield performance improvements for the final Reinforcement learning (RL) agent at the cost of additional evaluation during training.

Online Optimization of Curriculum Learning Schedules using Evolutionary Optimization

TL;DR

This work tackles the challenge of manually designing curricula for reinforcement learning by proposing RHEA CL, which jointly uses Curriculum Learning and Rolling Horizon Evolutionary Algorithms to optimize curricula online during PPO-based training. The approach maintains a population of candidate curricula, evaluates them across curriculum steps, and selects the best to continue in the next epoch, effectively adapting task difficulty to the agent’s learning progress. Empirical results on Minigrid tasks DoorKey and DynamicObstacles show that RHEA CL yields faster early improvements and competitive final performance, at the cost of additional curriculum evaluations during training. The paper also analyzes hyperparameter effects (e.g., , , ) and compares against baselines such as RHRS, SPCL, AllParallel, and vanilla PPO, highlighting the potential of automated curriculum optimization to enhance learning speed and robustness in dynamic environments.

Abstract

We propose RHEA CL, which combines Curriculum Learning (CL) with Rolling Horizon Evolutionary Algorithms (RHEA) to automatically produce effective curricula during the training of a reinforcement learning agent. RHEA CL optimizes a population of curricula, using an evolutionary algorithm, and selects the best-performing curriculum as the starting point for the next training epoch. Performance evaluations are conducted after every curriculum step in all environments. We evaluate the algorithm on the \textit{DoorKey} and \textit{DynamicObstacles} environments within the Minigrid framework. It demonstrates adaptability and consistent improvement, particularly in the early stages, while reaching a stable performance later that is capable of outperforming other curriculum learners. In comparison to other curriculum schedules, RHEA CL has been shown to yield performance improvements for the final Reinforcement learning (RL) agent at the cost of additional evaluation during training.
Paper Structure (23 sections, 3 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 23 sections, 3 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Different sizes of the DoorKey and DynamicObstacles environments. The lighter area indicates the observation space of the agent.
  • Figure 2: Results of the hyperparameter optimization.
  • Figure 3: Comparing test performance of tested curriculum learning methods.
  • Figure 4: Comparing test performance of tested curriculum learning methods with standard deviation.
  • Figure 5: Network Architecture PPO