Table of Contents
Fetching ...

State Chrono Representation for Enhancing Generalization in Reinforcement Learning

Jianda Chen, Wen Zheng Terence Ng, Zichen Chen, Sinno Jialin Pan, Tianwei Zhang

TL;DR

The proposed SCR augments state metric-based representations by incorporating extensive temporal information into the update step of bisimulation metric learning and learns state distances within a temporal framework that considers both future dynamics and cumulative rewards over current and long-term future states.

Abstract

In reinforcement learning with image-based inputs, it is crucial to establish a robust and generalizable state representation. Recent advancements in metric learning, such as deep bisimulation metric approaches, have shown promising results in learning structured low-dimensional representation space from pixel observations, where the distance between states is measured based on task-relevant features. However, these approaches face challenges in demanding generalization tasks and scenarios with non-informative rewards. This is because they fail to capture sufficient long-term information in the learned representations. To address these challenges, we propose a novel State Chrono Representation (SCR) approach. SCR augments state metric-based representations by incorporating extensive temporal information into the update step of bisimulation metric learning. It learns state distances within a temporal framework that considers both future dynamics and cumulative rewards over current and long-term future states. Our learning strategy effectively incorporates future behavioral information into the representation space without introducing a significant number of additional parameters for modeling dynamics. Extensive experiments conducted in DeepMind Control and Meta-World environments demonstrate that SCR achieves better performance comparing to other recent metric-based methods in demanding generalization tasks. The codes of SCR are available in https://github.com/jianda-chen/SCR.

State Chrono Representation for Enhancing Generalization in Reinforcement Learning

TL;DR

The proposed SCR augments state metric-based representations by incorporating extensive temporal information into the update step of bisimulation metric learning and learns state distances within a temporal framework that considers both future dynamics and cumulative rewards over current and long-term future states.

Abstract

In reinforcement learning with image-based inputs, it is crucial to establish a robust and generalizable state representation. Recent advancements in metric learning, such as deep bisimulation metric approaches, have shown promising results in learning structured low-dimensional representation space from pixel observations, where the distance between states is measured based on task-relevant features. However, these approaches face challenges in demanding generalization tasks and scenarios with non-informative rewards. This is because they fail to capture sufficient long-term information in the learned representations. To address these challenges, we propose a novel State Chrono Representation (SCR) approach. SCR augments state metric-based representations by incorporating extensive temporal information into the update step of bisimulation metric learning. It learns state distances within a temporal framework that considers both future dynamics and cumulative rewards over current and long-term future states. Our learning strategy effectively incorporates future behavioral information into the representation space without introducing a significant number of additional parameters for modeling dynamics. Extensive experiments conducted in DeepMind Control and Meta-World environments demonstrate that SCR achieves better performance comparing to other recent metric-based methods in demanding generalization tasks. The codes of SCR are available in https://github.com/jianda-chen/SCR.

Paper Structure

This paper contains 37 sections, 13 theorems, 31 equations, 13 figures, 8 tables, 1 algorithm.

Key Result

Theorem 2.1

The $\pi$-bisimulation metric update operator $\mathcal{F}_{bisim}: \mathbb{M} \to \mathbb{M}$ is defined as where $\mathbb{M}$ is the space of $d$, $r^{\pi}_{\mathbf{x}} = \sum_{a \in \mathcal{A}} \pi(a|\mathbf{x})r^{a}_{\mathbf{x}}$, $P^{\pi}_{\mathbf{x}}=\sum_{a \in \mathcal{A}} \pi(a|\mathbf{x})P^{a}_{\mathbf{x}}$, $\mathcal{W}$ is the Wasserstein distance, and $r^{a}_{\mathbf{x}}$ is $r(\mat

Figures (13)

  • Figure 1: Overall architecture of SCR.
  • Figure 2: An example with two rollouts.
  • Figure 4: Aggregate metrics on distract setting.
  • Figure 5: Training curves of SCR and baseline methods in the distraction setting of DM_Control. Mean scores on 10 runs with std (shadow shape). Training curves of all tasks are shown in Appendix \ref{['app:sec:training_curves_dmc']}.
  • Figure 6: Ablation study on cheetah-run (left) and walker-walk (right) in the distraction setting. Mean scores on 10 runs with std (shadow shape).
  • ...and 8 more figures

Theorems & Definitions (25)

  • Theorem 2.1: $\pi$-bisimulation metric
  • Theorem 2.2: MICo distance
  • Theorem 3.1
  • proof
  • Definition 3.2: Diffuse metric castro2021mico
  • Definition 3.3
  • Theorem 3.4
  • proof
  • Lemma 3.5: Non-zero self-distance
  • Theorem 3.6
  • ...and 15 more