Probabilistic Curriculum Learning for Goal-Based Reinforcement Learning
Llewyn Salt, Marcus Gallagher
TL;DR
Probabilistic Curriculum Learning (PCL) tackles automatic goal generation for goal-based reinforcement learning in continuous domains by modeling goal feasibility as p(g_t^s|pi) ≈ p(s_{t+N}|s_t,a_t) and learning the distribution with a Mixture Density Network (MDN). Training uses stochastic variational inference (SVI) to optimise a loss L_Theta = - (lambda_1/N) sum log p(g|s_t,a_t) + lambda_2 ||Theta||^2 + lambda_3 D_KL(q(s_{t+1})||p_Theta(g|s_t,a_t)), enabling dynamic goal sampling within quantiles Q_lower, Q_upper. Goals are sampled from a distribution D and filtered by these quantiles to balance difficulty; an adaptive quantile mechanism adjusts bounds based on short-term success rate sr and streak s. Experiments with SAC on DC Motor and Point Maze show faster learning, better generalisation across multiple goals, and improved long-horizon performance compared with a uniform curriculum. This approach removes restrictive initialisations and supports flexible automatic curriculum generation, with potential extensions to other probabilistic models.
Abstract
Reinforcement learning (RL) -- algorithms that teach artificial agents to interact with environments by maximising reward signals -- has achieved significant success in recent years. These successes have been facilitated by advances in algorithms (e.g., deep Q-learning, deep deterministic policy gradients, proximal policy optimisation, trust region policy optimisation, and soft actor-critic) and specialised computational resources such as GPUs and TPUs. One promising research direction involves introducing goals to allow multimodal policies, commonly through hierarchical or curriculum reinforcement learning. These methods systematically decompose complex behaviours into simpler sub-tasks, analogous to how humans progressively learn skills (e.g. we learn to run before we walk, or we learn arithmetic before calculus). However, fully automating goal creation remains an open challenge. We present a novel probabilistic curriculum learning algorithm to suggest goals for reinforcement learning agents in continuous control and navigation tasks.
