Table of Contents
Fetching ...

Learning with AMIGo: Adversarially Motivated Intrinsic Goals

Andres Campero, Roberta Raileanu, Heinrich Küttler, Joshua B. Tenenbaum, Tim Rocktäschel, Edward Grefenstette

TL;DR

AMIGo introduces a meta-learning framework in which a goal-generating teacher provides adversarially motivated intrinsic goals to a goal-conditioned student, creating an automatic curriculum that enhances exploration under sparse rewards. The teacher and student optimize an adversarial yet constructive objective, enabling the agent to progressively tackle harder tasks in procedurally generated MiniGrid environments. Across 114 experiments and six tasks, AMIGo delivers state-of-the-art results on challenging environments and demonstrates improved sample efficiency relative to prior intrinsic-motivation methods. The work provides a model-agnostic approach to improving exploration in RL with potential extensions to language goals, partial observability, and continuous-control settings.

Abstract

A key challenge for reinforcement learning (RL) consists of learning in environments with sparse extrinsic rewards. In contrast to current RL methods, humans are able to learn new skills with little or no reward by using various forms of intrinsic motivation. We propose AMIGo, a novel agent incorporating -- as form of meta-learning -- a goal-generating teacher that proposes Adversarially Motivated Intrinsic Goals to train a goal-conditioned "student" policy in the absence of (or alongside) environment reward. Specifically, through a simple but effective "constructively adversarial" objective, the teacher learns to propose increasingly challenging -- yet achievable -- goals that allow the student to learn general skills for acting in a new environment, independent of the task to be solved. We show that our method generates a natural curriculum of self-proposed goals which ultimately allows the agent to solve challenging procedurally-generated tasks where other forms of intrinsic motivation and state-of-the-art RL methods fail.

Learning with AMIGo: Adversarially Motivated Intrinsic Goals

TL;DR

AMIGo introduces a meta-learning framework in which a goal-generating teacher provides adversarially motivated intrinsic goals to a goal-conditioned student, creating an automatic curriculum that enhances exploration under sparse rewards. The teacher and student optimize an adversarial yet constructive objective, enabling the agent to progressively tackle harder tasks in procedurally generated MiniGrid environments. Across 114 experiments and six tasks, AMIGo delivers state-of-the-art results on challenging environments and demonstrates improved sample efficiency relative to prior intrinsic-motivation methods. The work provides a model-agnostic approach to improving exploration in RL with potential extensions to language goals, partial observability, and continuous-control settings.

Abstract

A key challenge for reinforcement learning (RL) consists of learning in environments with sparse extrinsic rewards. In contrast to current RL methods, humans are able to learn new skills with little or no reward by using various forms of intrinsic motivation. We propose AMIGo, a novel agent incorporating -- as form of meta-learning -- a goal-generating teacher that proposes Adversarially Motivated Intrinsic Goals to train a goal-conditioned "student" policy in the absence of (or alongside) environment reward. Specifically, through a simple but effective "constructively adversarial" objective, the teacher learns to propose increasingly challenging -- yet achievable -- goals that allow the student to learn general skills for acting in a new environment, independent of the task to be solved. We show that our method generates a natural curriculum of self-proposed goals which ultimately allows the agent to solve challenging procedurally-generated tasks where other forms of intrinsic motivation and state-of-the-art RL methods fail.

Paper Structure

This paper contains 19 sections, 4 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Training with AMIGo consists of combining two modules: a goal-generating teacher and a goal-conditioned student policy, whereby the teacher provides intrinsic goals to supplement the extrinsic goals from the environment. In our experimental set-up, the teacher is a dimensionality-preserving convolutional network which, at the beginning of an episode, outputs a location in absolute $(x,y)$ coordinates. These are provided as a one-hot indicator in an extra channel of the student's convolutional neural network, which in turn outputs the agent's actions.
  • Figure 2: Examples of MiniGrid environments. KCharder requires finding the key that can unlock a door which blocks the room where the goal is (the blue ball). OMhard requires a sequence of correct steps usually involving opening a door, opening a chest to find a key of the correct color, picking-up the key to open the door, and opening the door to reach the goal. The configuration and colors of the objects change from one episode to another. To our knowledge, AMIGo is the only algorithm that can solve these tasks. For other examples, see the https://github.com/maximecb/gym-minigrid.
  • Figure 3: Examples of a curriculum of goals proposed for different episodes of a particular learning trajectory on OMhard. The red triangle is the agent, the red square is the goal proposed by the teacher, and the blue ball is the extrinsic goal. The top panel shows the threshold target difficulty, $t^*$ of the goals proposed by the teacher. The teacher first proposes very easy nearby goals, then it learns to propose goals that involve traversing rooms and opening doors, while in the third phase the teacher proposes goals which involve removing obstacles and interacting with objects.
  • Figure 4: Reward curves over training time comparing AMIGo to competing methods and baselines. The y-axis shows the Mean Extrinsic Reward (performance) obtained in two medium and four harder different environments, shown for 30M and 500M frames respectively.
  • Figure 5: An example of a learning trajectory on OMhard, one of the most challenging environments. Despite the lack of extrinsic reward, the panels show the dynamics of the intrinsic rewards for the student (top panel), for the teacher (middle panel), and the difficulty of the goals captured as $t^*$ (bottom panel).
  • ...and 1 more figures