Table of Contents
Fetching ...

Automating Curriculum Learning for Reinforcement Learning using a Skill-Based Bayesian Network

Vincent Hsiao, Mark Roberts, Laura M. Hiatt, George Konidaris, Dana Nau

TL;DR

SEBNs provide a probabilistic framework to automate RL curricula by linking environment features, task targets, and agent competencies (latent or explicit). The approach infers competency levels from past rollouts and uses predicted success on unseen environments to bias curriculum generation via an expected-improvement criterion, without needing explicit evaluation for every candidate task. Empirical results across DoorKey, BipedalWalker, and Robosuite tasks show faster learning and improved robustness, with notable gains in continuous control and robotics domains and competitive performance in gridworld. The work offers a principled path for transfer-aware curriculum design and points to extensions with dynamic skill sets and language-assisted SEBN construction.

Abstract

A major challenge for reinforcement learning is automatically generating curricula to reduce training time or improve performance in some target task. We introduce SEBNs (Skill-Environment Bayesian Networks) which model a probabilistic relationship between a set of skills, a set of goals that relate to the reward structure, and a set of environment features to predict policy performance on (possibly unseen) tasks. We develop an algorithm that uses the inferred estimates of agent success from SEBN to weigh the possible next tasks by expected improvement. We evaluate the benefit of the resulting curriculum on three environments: a discrete gridworld, continuous control, and simulated robotics. The results show that curricula constructed using SEBN frequently outperform other baselines.

Automating Curriculum Learning for Reinforcement Learning using a Skill-Based Bayesian Network

TL;DR

SEBNs provide a probabilistic framework to automate RL curricula by linking environment features, task targets, and agent competencies (latent or explicit). The approach infers competency levels from past rollouts and uses predicted success on unseen environments to bias curriculum generation via an expected-improvement criterion, without needing explicit evaluation for every candidate task. Empirical results across DoorKey, BipedalWalker, and Robosuite tasks show faster learning and improved robustness, with notable gains in continuous control and robotics domains and competitive performance in gridworld. The work offers a principled path for transfer-aware curriculum design and points to extensions with dynamic skill sets and language-assisted SEBN construction.

Abstract

A major challenge for reinforcement learning is automatically generating curricula to reduce training time or improve performance in some target task. We introduce SEBNs (Skill-Environment Bayesian Networks) which model a probabilistic relationship between a set of skills, a set of goals that relate to the reward structure, and a set of environment features to predict policy performance on (possibly unseen) tasks. We develop an algorithm that uses the inferred estimates of agent success from SEBN to weigh the possible next tasks by expected improvement. We evaluate the benefit of the resulting curriculum on three environments: a discrete gridworld, continuous control, and simulated robotics. The results show that curricula constructed using SEBN frequently outperform other baselines.

Paper Structure

This paper contains 33 sections, 2 equations, 11 figures, 2 algorithms.

Figures (11)

  • Figure 1: Challenge environments for BipedalWalker with corresponding descriptors (P:pit gap, S:stump height, W:stair width, N:stair steps, and R:ground roughness).
  • Figure 2: The SEBN for the Bipedal Walker environment.
  • Figure 3: Example environments for DoorKey with corresponding environment features (D:distance, W:wall, L:locked door) and target features (K:key, O:opened, A:at).
  • Figure 4: The SEBN for the DoorKey environment.
  • Figure 5: Result of employing a SEBN-guided automated curriculum on the DoorKey environment.
  • ...and 6 more figures