Table of Contents
Fetching ...

Teacher-Student Curriculum Learning

Tambet Matiisen, Avital Oliver, Taco Cohen, John Schulman

TL;DR

Teacher-Student Curriculum Learning (TSCL) automates curriculum design by having a Teacher select subtasks for a Student based on learning progress, formalized as a POMDP with options for single-task and batch supervision. A family of progress-based algorithms (Online, Naive, Window, Sampling) estimates learning progress using slopes of score changes and addresses forgetting by emphasizing tasks showing negative progress. Empirical results in decimal addition with LSTM and Minecraft navigation show TSCL can match or exceed manually crafted curricula, solving tasks that are hard to learn with uniform sampling and accelerating training by orders of magnitude in some settings. The framework reduces the manual burden of curriculum design and demonstrates robust, automatic adaptation across discrete subtasks in both supervised and reinforcement learning contexts.

Abstract

We propose Teacher-Student Curriculum Learning (TSCL), a framework for automatic curriculum learning, where the Student tries to learn a complex task and the Teacher automatically chooses subtasks from a given set for the Student to train on. We describe a family of Teacher algorithms that rely on the intuition that the Student should practice more those tasks on which it makes the fastest progress, i.e. where the slope of the learning curve is highest. In addition, the Teacher algorithms address the problem of forgetting by also choosing tasks where the Student's performance is getting worse. We demonstrate that TSCL matches or surpasses the results of carefully hand-crafted curricula in two tasks: addition of decimal numbers with LSTM and navigation in Minecraft. Using our automatically generated curriculum enabled to solve a Minecraft maze that could not be solved at all when training directly on solving the maze, and the learning was an order of magnitude faster than uniform sampling of subtasks.

Teacher-Student Curriculum Learning

TL;DR

Teacher-Student Curriculum Learning (TSCL) automates curriculum design by having a Teacher select subtasks for a Student based on learning progress, formalized as a POMDP with options for single-task and batch supervision. A family of progress-based algorithms (Online, Naive, Window, Sampling) estimates learning progress using slopes of score changes and addresses forgetting by emphasizing tasks showing negative progress. Empirical results in decimal addition with LSTM and Minecraft navigation show TSCL can match or exceed manually crafted curricula, solving tasks that are hard to learn with uniform sampling and accelerating training by orders of magnitude in some settings. The framework reduces the manual burden of curriculum design and demonstrates robust, automatic adaptation across discrete subtasks in both supervised and reinforcement learning contexts.

Abstract

We propose Teacher-Student Curriculum Learning (TSCL), a framework for automatic curriculum learning, where the Student tries to learn a complex task and the Teacher automatically chooses subtasks from a given set for the Student to train on. We describe a family of Teacher algorithms that rely on the intuition that the Student should practice more those tasks on which it makes the fastest progress, i.e. where the slope of the learning curve is highest. In addition, the Teacher algorithms address the problem of forgetting by also choosing tasks where the Student's performance is getting worse. We demonstrate that TSCL matches or surpasses the results of carefully hand-crafted curricula in two tasks: addition of decimal numbers with LSTM and navigation in Minecraft. Using our automatically generated curriculum enabled to solve a Minecraft maze that could not be solved at all when training directly on solving the maze, and the learning was an order of magnitude faster than uniform sampling of subtasks.

Paper Structure

This paper contains 25 sections, 2 equations, 10 figures, 8 algorithms.

Figures (10)

  • Figure 1: The Teacher-Student setup
  • Figure 2: Idealistic curriculum learning. Left: Scores of different tasks improve over time, the next task starts improving once the previous task has been mastered. Right: Probability of sampling a task depends on the slope of the learning curve.
  • Figure 3: Results for 9-digit 1D addition, lower is better. Variants using the absolute value of the expected reward surpass the best manual curriculum ("combined").
  • Figure 4: Progression of the task distribution over time for 9-digit 1D addition (Sampling). The algorithm progresses from simpler tasks to more complicated. Harder tasks take longer to learn and the algorithm keeps training on easier tasks to counter unlearning.
  • Figure 5: Results for 9-digit 2D addition, lower is better. The task seems easier, manual curriculum is hard to beat and uniform sampling is competitive.
  • ...and 5 more figures