Table of Contents
Fetching ...

Learning Action-based Representations Using Invariance

Max Rudolph, Caleb Chuck, Kevin Black, Misha Lvovsky, Scott Niekum, Amy Zhang

TL;DR

This work introduces action-bisimulation encoding, a method inspired by the bisimulation invariance pseudometric, that extends single-step controllability with a recursive invariance constraint and demonstrates that action-bisimulation pretraining on reward-free, uniformly random data improves sample efficiency in several environments.

Abstract

Robust reinforcement learning agents using high-dimensional observations must be able to identify relevant state features amidst many exogeneous distractors. A representation that captures controllability identifies these state elements by determining what affects agent control. While methods such as inverse dynamics and mutual information capture controllability for a limited number of timesteps, capturing long-horizon elements remains a challenging problem. Myopic controllability can capture the moment right before an agent crashes into a wall, but not the control-relevance of the wall while the agent is still some distance away. To address this we introduce action-bisimulation encoding, a method inspired by the bisimulation invariance pseudometric, that extends single-step controllability with a recursive invariance constraint. By doing this, action-bisimulation learns a multi-step controllability metric that smoothly discounts distant state features that are relevant for control. We demonstrate that action-bisimulation pretraining on reward-free, uniformly random data improves sample efficiency in several environments, including a photorealistic 3D simulation domain, Habitat. Additionally, we provide theoretical analysis and qualitative results demonstrating the information captured by action-bisimulation.

Learning Action-based Representations Using Invariance

TL;DR

This work introduces action-bisimulation encoding, a method inspired by the bisimulation invariance pseudometric, that extends single-step controllability with a recursive invariance constraint and demonstrates that action-bisimulation pretraining on reward-free, uniformly random data improves sample efficiency in several environments.

Abstract

Robust reinforcement learning agents using high-dimensional observations must be able to identify relevant state features amidst many exogeneous distractors. A representation that captures controllability identifies these state elements by determining what affects agent control. While methods such as inverse dynamics and mutual information capture controllability for a limited number of timesteps, capturing long-horizon elements remains a challenging problem. Myopic controllability can capture the moment right before an agent crashes into a wall, but not the control-relevance of the wall while the agent is still some distance away. To address this we introduce action-bisimulation encoding, a method inspired by the bisimulation invariance pseudometric, that extends single-step controllability with a recursive invariance constraint. By doing this, action-bisimulation learns a multi-step controllability metric that smoothly discounts distant state features that are relevant for control. We demonstrate that action-bisimulation pretraining on reward-free, uniformly random data improves sample efficiency in several environments, including a photorealistic 3D simulation domain, Habitat. Additionally, we provide theoretical analysis and qualitative results demonstrating the information captured by action-bisimulation.
Paper Structure (40 sections, 6 theorems, 24 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 40 sections, 6 theorems, 24 equations, 9 figures, 2 tables, 1 algorithm.

Key Result

Theorem A.1

Let $\mathcal{M}$ be the space of bounded pseudometrics on $\mathcal{S}, \mathcal{A}$. Define operator $\mathcal{F}: \mathcal{M}$ based on the action-bisim distance metric in Theorem actionbisimulation: Then $\mathcal{F}$ is a contraction mapping and has a unique fixed point for a bounded dist.

Figures (9)

  • Figure 1: Left: mapping equivalentcontrollability together with action-bisimulation. Other methods can be too aggressive (single-step inverse dynamics would map together (a) and (e), reward-based methods would map together (a) and (d)), or permissive (autoregressive methods would map (a) and (c) to the same value). Right: data flow of action-bisimulation training. The single-step encoder is trained with inverse dynamics (Section \ref{['sec:controllability_measures']}). The multi-step encoder is trained with bootstrapped single-step representation distance (Equation \ref{['actionbisimulation']}).
  • Figure 2: Visual representation of the RL environments.
  • Figure 4: States close together in embedding space drawn from the Distractor Pointmaze domain. Notice that action-bisimulation captures both the agent location and the local region of obstacles, while other methods are distracted by the background (ACRO, bVAE) or only capture one-step relations (single step). The agent is exaggerated in these images so it is easier to locate---in reality, it is quite hard to detect because of the distractors.
  • Figure 5: Perturbation map of single step vs action-bisimulation shows encoder distance in 2D Navigation when obstacles are toggled at all locations around the agent (located at the center). Brightness at a pixel indicates the size of the change of representation. Left: The Single Step encoder myopically captures only directly adjacent obstacles. Right: The Multi Step encoder captures more distant obstacles.
  • Figure 6: Left: Violin Plot shows how the representation is sensitive to changes in obstacles near and distant to the agent. Right: Sample observation illustrates the near and distant regions with respect to the agent in the center.
  • ...and 4 more figures

Theorems & Definitions (9)

  • Definition 3.1: Bisimulation Relations givan2003equivalence
  • Definition 3.2: Action-Bisimulation Relations
  • Theorem A.1
  • Definition A.4
  • Theorem A.5
  • Lemma B.1
  • Theorem B.3
  • Lemma B.4
  • Lemma B.5