Table of Contents
Fetching ...

CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning

Cédric Colas, Pierre Fournier, Olivier Sigaud, Mohamed Chetouani, Pierre-Yves Oudeyer

TL;DR

CURIOUS tackles learning in open-ended environments by enabling autonomous goal generation and curriculum formation through intrinsic motivation. It introduces a modular goal representation within a single UVFA, using language-agnostic cross-module/goal replay guided by absolute learning progress and hindsight for knowledge transfer. The framework produces a self-organized developmental curriculum and demonstrates robustness to forgetting, sensor perturbations, and distracting goals in a modular Fetch-arm environment. These results advance autonomous continual learning by showing how modular policies and intrinsic motivation can be combined to discover and master a diverse repertoire of controllable tasks without external rewards.

Abstract

In open-ended environments, autonomous learning agents must set their own goals and build their own curriculum through an intrinsically motivated exploration. They may consider a large diversity of goals, aiming to discover what is controllable in their environments, and what is not. Because some goals might prove easy and some impossible, agents must actively select which goal to practice at any moment, to maximize their overall mastery on the set of learnable goals. This paper proposes CURIOUS, an algorithm that leverages 1) a modular Universal Value Function Approximator with hindsight learning to achieve a diversity of goals of different kinds within a unique policy and 2) an automated curriculum learning mechanism that biases the attention of the agent towards goals maximizing the absolute learning progress. Agents focus sequentially on goals of increasing complexity, and focus back on goals that are being forgotten. Experiments conducted in a new modular-goal robotic environment show the resulting developmental self-organization of a learning curriculum, and demonstrate properties of robustness to distracting goals, forgetting and changes in body properties.

CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning

TL;DR

CURIOUS tackles learning in open-ended environments by enabling autonomous goal generation and curriculum formation through intrinsic motivation. It introduces a modular goal representation within a single UVFA, using language-agnostic cross-module/goal replay guided by absolute learning progress and hindsight for knowledge transfer. The framework produces a self-organized developmental curriculum and demonstrates robustness to forgetting, sensor perturbations, and distracting goals in a modular Fetch-arm environment. These results advance autonomous continual learning by showing how modular policies and intrinsic motivation can be combined to discover and master a diverse repertoire of controllable tasks without external rewards.

Abstract

In open-ended environments, autonomous learning agents must set their own goals and build their own curriculum through an intrinsically motivated exploration. They may consider a large diversity of goals, aiming to discover what is controllable in their environments, and what is not. Because some goals might prove easy and some impossible, agents must actively select which goal to practice at any moment, to maximize their overall mastery on the set of learnable goals. This paper proposes CURIOUS, an algorithm that leverages 1) a modular Universal Value Function Approximator with hindsight learning to achieve a diversity of goals of different kinds within a unique policy and 2) an automated curriculum learning mechanism that biases the attention of the agent towards goals maximizing the absolute learning progress. Agents focus sequentially on goals of increasing complexity, and focus back on goals that are being forgotten. Experiments conducted in a new modular-goal robotic environment show the resulting developmental self-organization of a learning curriculum, and demonstrate properties of robustness to distracting goals, forgetting and changes in body properties.

Paper Structure

This paper contains 42 sections, 5 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: The Modular Goal Fetch Arm environment. An intrinsically motivated agent can set its own (modular) goals (Reach, Push, Pick and Place, Stack), with multiple objects and distractors.
  • Figure 2: Modular goal-parameterized actor-critic architecture (m-uvfa). Toy example with $2$ modules, parameterized by $g_{1}$ (2D) and $g_{2}$ (1D) respectively. Here, the agent is attempting goal $g_1$ in module $M_1$, as specified by the one-hot module descriptor $m_d~=~\langle1,0\rangle$. The actor (left) computes the action $a_t$. The critic (right) computes the $Q$-value.
  • Figure 3: Schematic view of curious.
  • Figure 4: Visualization of a single run. a: Module-dependent subjective measures of competence for curious (1 run). b: Corresponding module-dependent subjective measures of absolute LP. c: Corresponding probabilities $p_{LP}$ to select modules to practice or to learn about.
  • Figure 5: Impact of the policy and value function architecture. Average success rates computed over achievable goals. Mean +/- std over 10 trials are plotted, while dots indicate significance when testing m-uvfa against mg-me.
  • ...and 4 more figures