Table of Contents
Fetching ...

Autonomous Curriculum Design via Relative Entropy Based Task Modifications

Muhammed Yusuf Satici, Jianxun Wang, David L. Roberts

TL;DR

Curriculum learning can reduce training time but often requires manual design. The authors propose READ-C, an autonomous curriculum design framework that identifies high-uncertainty states using a relative-entropy measure $D_{KL}(P_{true}||P_{learnt})$ and uses start-state modifications to steer learning. They present two implementations: READ-C-TD with a teacher-based uncertainty calculation and READ-C-SA with a self-assessed regressor, and prove convergence under a two-time-scale RL framework. Empirical evaluation across Key-Lock, Capture-the-Flag, and Parking domains shows READ-C variants outperform random curricula and direct target-task learning, with READ-C-SA offering robust, teacher-free gains and heuristic variants further boosting performance. The work demonstrates a scalable, uncertainty-driven mechanism for automated curriculum design with practical gains in sample efficiency.

Abstract

Curriculum learning is a training method in which an agent is first trained on a curriculum of relatively simple tasks related to a target task in an effort to shorten the time required to train on the target task. Autonomous curriculum design involves the design of such curriculum with no reliance on human knowledge and/or expertise. Finding an efficient and effective way of autonomously designing curricula remains an open problem. We propose a novel approach for automatically designing curricula by leveraging the learner's uncertainty to select curricula tasks. Our approach measures the uncertainty in the learner's policy using relative entropy, and guides the agent to states of high uncertainty to facilitate learning. Our algorithm supports the generation of autonomous curricula in a self-assessed manner by leveraging the learner's past and current policies but it also allows the use of teacher guided design in an instructive setting. We provide theoretical guarantees for the convergence of our algorithm using two time-scale optimization processes. Results show that our algorithm outperforms randomly generated curriculum, and learning directly on the target task as well as the curriculum-learning criteria existing in literature. We also present two additional heuristic distance measures that could be combined with our relative-entropy approach for further performance improvements.

Autonomous Curriculum Design via Relative Entropy Based Task Modifications

TL;DR

Curriculum learning can reduce training time but often requires manual design. The authors propose READ-C, an autonomous curriculum design framework that identifies high-uncertainty states using a relative-entropy measure and uses start-state modifications to steer learning. They present two implementations: READ-C-TD with a teacher-based uncertainty calculation and READ-C-SA with a self-assessed regressor, and prove convergence under a two-time-scale RL framework. Empirical evaluation across Key-Lock, Capture-the-Flag, and Parking domains shows READ-C variants outperform random curricula and direct target-task learning, with READ-C-SA offering robust, teacher-free gains and heuristic variants further boosting performance. The work demonstrates a scalable, uncertainty-driven mechanism for automated curriculum design with practical gains in sample efficiency.

Abstract

Curriculum learning is a training method in which an agent is first trained on a curriculum of relatively simple tasks related to a target task in an effort to shorten the time required to train on the target task. Autonomous curriculum design involves the design of such curriculum with no reliance on human knowledge and/or expertise. Finding an efficient and effective way of autonomously designing curricula remains an open problem. We propose a novel approach for automatically designing curricula by leveraging the learner's uncertainty to select curricula tasks. Our approach measures the uncertainty in the learner's policy using relative entropy, and guides the agent to states of high uncertainty to facilitate learning. Our algorithm supports the generation of autonomous curricula in a self-assessed manner by leveraging the learner's past and current policies but it also allows the use of teacher guided design in an instructive setting. We provide theoretical guarantees for the convergence of our algorithm using two time-scale optimization processes. Results show that our algorithm outperforms randomly generated curriculum, and learning directly on the target task as well as the curriculum-learning criteria existing in literature. We also present two additional heuristic distance measures that could be combined with our relative-entropy approach for further performance improvements.

Paper Structure

This paper contains 29 sections, 28 equations, 9 figures, 2 tables, 4 algorithms.

Figures (9)

  • Figure 1: Visualization of the highest uncertainty region for READ-C variants in an environment with a single goal.
  • Figure 2: Performance of the Curriculum-Learning Algorithms as a Function of Training Steps in Key-Lock Domain.
  • Figure 3: Effect of Cluster Size on the Performance of the Curriculum-Learning Algorithms in Key-Lock Domain.
  • Figure 4: Box Plots for the Convergence Times of the Algorithms in Key-Lock Domain.
  • Figure 5: Performance of the Curriculum-Learning Algorithms as a Function of Training Steps in Capture-the-Flag Domain.
  • ...and 4 more figures