Table of Contents
Fetching ...

Exploration by Running Away from the Past

Paul-Antoine Le Tolguenec, Yann Besse, Florent Teichteil-Koenigsbuch, Dennis G. Wilson, Emmanuel Rachelson

TL;DR

This work considers exploration through the lens of information theory and demonstrates that by encouraging the agent to explore by actively distancing itself from past experiences, it can effectively explore mazes and a wide range of behaviors on robotic manipulation and locomotion tasks.

Abstract

The ability to explore efficiently and effectively is a central challenge of reinforcement learning. In this work, we consider exploration through the lens of information theory. Specifically, we cast exploration as a problem of maximizing the Shannon entropy of the state occupation measure. This is done by maximizing a sequence of divergences between distributions representing an agent's past behavior and its current behavior. Intuitively, this encourages the agent to explore new behaviors that are distinct from past behaviors. Hence, we call our method RAMP, for ``$\textbf{R}$unning $\textbf{A}$way fro$\textbf{m}$ the $\textbf{P}$ast.'' A fundamental question of this method is the quantification of the distribution change over time. We consider both the Kullback-Leibler divergence and the Wasserstein distance to quantify divergence between successive state occupation measures, and explain why the former might lead to undesirable exploratory behaviors in some tasks. We demonstrate that by encouraging the agent to explore by actively distancing itself from past experiences, it can effectively explore mazes and a wide range of behaviors on robotic manipulation and locomotion tasks.

Exploration by Running Away from the Past

TL;DR

This work considers exploration through the lens of information theory and demonstrates that by encouraging the agent to explore by actively distancing itself from past experiences, it can effectively explore mazes and a wide range of behaviors on robotic manipulation and locomotion tasks.

Abstract

The ability to explore efficiently and effectively is a central challenge of reinforcement learning. In this work, we consider exploration through the lens of information theory. Specifically, we cast exploration as a problem of maximizing the Shannon entropy of the state occupation measure. This is done by maximizing a sequence of divergences between distributions representing an agent's past behavior and its current behavior. Intuitively, this encourages the agent to explore new behaviors that are distinct from past behaviors. Hence, we call our method RAMP, for ``unning way fro the ast.'' A fundamental question of this method is the quantification of the distribution change over time. We consider both the Kullback-Leibler divergence and the Wasserstein distance to quantify divergence between successive state occupation measures, and explain why the former might lead to undesirable exploratory behaviors in some tasks. We demonstrate that by encouraging the agent to explore by actively distancing itself from past experiences, it can effectively explore mazes and a wide range of behaviors on robotic manipulation and locomotion tasks.

Paper Structure

This paper contains 25 sections, 7 theorems, 42 equations, 7 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

Figures (7)

  • Figure 1: The four steps of the RAMP algorithm.
  • Figure 2: (a) An illustration of the different experience distributions on the U-maze. (b) A comparison between $\text{RAMP}_{\textcolor{lightblue}{KL}}$ and $\text{RAMP}_{\textcolor{lightcoral}{\mathcal{W}}}$ on HalfCheetah. Color indicates the reward estimate given by $f_{\phi}$, normalized between -1 and 1.
  • Figure 3: $XY$-coordinates of the Ant's torso at different timesteps $T$ of training. Color indicates the density used as reward model for $\text{RAMP}_{\textcolor{lightcoral}{\mathcal{W}}}$.
  • Figure 4: Euclidean coordinates of the states contained in the buffers used by RAMP with the color indicating the output of $f_{\phi}$.
  • Figure 5: Set of tasks.
  • ...and 2 more figures

Theorems & Definitions (11)

  • Theorem 1: Lower Bound on $\Delta_{n+1}$
  • Theorem 2
  • Theorem 3
  • Theorem
  • proof
  • Theorem
  • proof
  • Theorem
  • proof
  • Proposition 4
  • ...and 1 more