Table of Contents
Fetching ...

Probabilistic Curriculum Learning for Goal-Based Reinforcement Learning

Llewyn Salt, Marcus Gallagher

TL;DR

Probabilistic Curriculum Learning (PCL) tackles automatic goal generation for goal-based reinforcement learning in continuous domains by modeling goal feasibility as p(g_t^s|pi) ≈ p(s_{t+N}|s_t,a_t) and learning the distribution with a Mixture Density Network (MDN). Training uses stochastic variational inference (SVI) to optimise a loss L_Theta = - (lambda_1/N) sum log p(g|s_t,a_t) + lambda_2 ||Theta||^2 + lambda_3 D_KL(q(s_{t+1})||p_Theta(g|s_t,a_t)), enabling dynamic goal sampling within quantiles Q_lower, Q_upper. Goals are sampled from a distribution D and filtered by these quantiles to balance difficulty; an adaptive quantile mechanism adjusts bounds based on short-term success rate sr and streak s. Experiments with SAC on DC Motor and Point Maze show faster learning, better generalisation across multiple goals, and improved long-horizon performance compared with a uniform curriculum. This approach removes restrictive initialisations and supports flexible automatic curriculum generation, with potential extensions to other probabilistic models.

Abstract

Reinforcement learning (RL) -- algorithms that teach artificial agents to interact with environments by maximising reward signals -- has achieved significant success in recent years. These successes have been facilitated by advances in algorithms (e.g., deep Q-learning, deep deterministic policy gradients, proximal policy optimisation, trust region policy optimisation, and soft actor-critic) and specialised computational resources such as GPUs and TPUs. One promising research direction involves introducing goals to allow multimodal policies, commonly through hierarchical or curriculum reinforcement learning. These methods systematically decompose complex behaviours into simpler sub-tasks, analogous to how humans progressively learn skills (e.g. we learn to run before we walk, or we learn arithmetic before calculus). However, fully automating goal creation remains an open challenge. We present a novel probabilistic curriculum learning algorithm to suggest goals for reinforcement learning agents in continuous control and navigation tasks.

Probabilistic Curriculum Learning for Goal-Based Reinforcement Learning

TL;DR

Probabilistic Curriculum Learning (PCL) tackles automatic goal generation for goal-based reinforcement learning in continuous domains by modeling goal feasibility as p(g_t^s|pi) ≈ p(s_{t+N}|s_t,a_t) and learning the distribution with a Mixture Density Network (MDN). Training uses stochastic variational inference (SVI) to optimise a loss L_Theta = - (lambda_1/N) sum log p(g|s_t,a_t) + lambda_2 ||Theta||^2 + lambda_3 D_KL(q(s_{t+1})||p_Theta(g|s_t,a_t)), enabling dynamic goal sampling within quantiles Q_lower, Q_upper. Goals are sampled from a distribution D and filtered by these quantiles to balance difficulty; an adaptive quantile mechanism adjusts bounds based on short-term success rate sr and streak s. Experiments with SAC on DC Motor and Point Maze show faster learning, better generalisation across multiple goals, and improved long-horizon performance compared with a uniform curriculum. This approach removes restrictive initialisations and supports flexible automatic curriculum generation, with potential extensions to other probabilistic models.

Abstract

Reinforcement learning (RL) -- algorithms that teach artificial agents to interact with environments by maximising reward signals -- has achieved significant success in recent years. These successes have been facilitated by advances in algorithms (e.g., deep Q-learning, deep deterministic policy gradients, proximal policy optimisation, trust region policy optimisation, and soft actor-critic) and specialised computational resources such as GPUs and TPUs. One promising research direction involves introducing goals to allow multimodal policies, commonly through hierarchical or curriculum reinforcement learning. These methods systematically decompose complex behaviours into simpler sub-tasks, analogous to how humans progressively learn skills (e.g. we learn to run before we walk, or we learn arithmetic before calculus). However, fully automating goal creation remains an open challenge. We present a novel probabilistic curriculum learning algorithm to suggest goals for reinforcement learning agents in continuous control and navigation tasks.

Paper Structure

This paper contains 25 sections, 21 equations, 6 figures, 12 tables, 1 algorithm.

Figures (6)

  • Figure 1: Illustration of the interaction between $Q_{upper}$ and $Q_{lower}$, the pdf, and goal selection.
  • Figure 2: The deep mixture density network architecture.
  • Figure 3: DC Motor Coverage and Distribution of Goals.
  • Figure 4: Bidirectional Maze Coverage
  • Figure 5: 21x21 Square Maze Coverage
  • ...and 1 more figures