Table of Contents
Fetching ...

Temporal Representations for Exploration: Learning Complex Exploratory Behavior without Extrinsic Rewards

Faisal Mohamed, Catherine Ji, Benjamin Eysenbach, Glen Berseth

TL;DR

This paper proposes an exploration method that leverages temporal contrastive representations to guide exploration, prioritizing states with unpredictable future outcomes and demonstrates that such representations can enable the learning of complex exploratory x in locomotion, manipulation, and embodied-AI tasks, revealing capabilities and behaviors that traditionally require extrinsic rewards.

Abstract

Effective exploration in reinforcement learning requires not only tracking where an agent has been, but also understanding how the agent perceives and represents the world. To learn powerful representations, an agent should actively explore states that contribute to its knowledge of the environment. Temporal representations can capture the information necessary to solve a wide range of potential tasks while avoiding the computational cost associated with full state reconstruction. In this paper, we propose an exploration method that leverages temporal contrastive representations to guide exploration, prioritizing states with unpredictable future outcomes. We demonstrate that such representations can enable the learning of complex exploratory x in locomotion, manipulation, and embodied-AI tasks, revealing capabilities and behaviors that traditionally require extrinsic rewards. Unlike approaches that rely on explicit distance learning or episodic memory mechanisms (e.g., quasimetric-based methods), our method builds directly on temporal similarities, yielding a simpler yet effective strategy for exploration.

Temporal Representations for Exploration: Learning Complex Exploratory Behavior without Extrinsic Rewards

TL;DR

This paper proposes an exploration method that leverages temporal contrastive representations to guide exploration, prioritizing states with unpredictable future outcomes and demonstrates that such representations can enable the learning of complex exploratory x in locomotion, manipulation, and embodied-AI tasks, revealing capabilities and behaviors that traditionally require extrinsic rewards.

Abstract

Effective exploration in reinforcement learning requires not only tracking where an agent has been, but also understanding how the agent perceives and represents the world. To learn powerful representations, an agent should actively explore states that contribute to its knowledge of the environment. Temporal representations can capture the information necessary to solve a wide range of potential tasks while avoiding the computational cost associated with full state reconstruction. In this paper, we propose an exploration method that leverages temporal contrastive representations to guide exploration, prioritizing states with unpredictable future outcomes. We demonstrate that such representations can enable the learning of complex exploratory x in locomotion, manipulation, and embodied-AI tasks, revealing capabilities and behaviors that traditionally require extrinsic rewards. Unlike approaches that rely on explicit distance learning or episodic memory mechanisms (e.g., quasimetric-based methods), our method builds directly on temporal similarities, yielding a simpler yet effective strategy for exploration.
Paper Structure (43 sections, 26 equations, 29 figures, 8 tables, 1 algorithm)

This paper contains 43 sections, 26 equations, 29 figures, 8 tables, 1 algorithm.

Figures (29)

  • Figure 1: Curiosity-Driven Exploration via Temporal Contrastive Learning. We learn temporal representations so that the representation of $(s_0, a_0)$ is more similar to $( s_{2,3,4,\ldots})$. We reward the agent for visiting future states that seem unpredictable. For example, from state $s_0$, state $s_1$ should confer lower reward than the state $s_4$.
  • Figure 2: Environments. Maze coverage, robotic manipulation, and the survival game Craftax.
  • Figure 3: C-TeC Performance compared to ETD jiang2025episodic C-TeC is competitive to ETD in terms of state coverage in continuous control environments, and outperform ETD in Crafter.
  • Figure 4: Evolution of the C-TeC reward during training. This figure shows how the intrinsic reward changes over the course of training based on future state visitation. The black circle in the lower-left corner represents the starting state. C-TeC reward captures the agent's future state density and rewards the agent for visiting states faraway in the future.
  • Figure 5: C-TeC explores more states than prior methods. We compare the state coverage of C-TeC to APT liu2021behavior, RND burda2018exploration and ICM pathak2017curiosity. We include a uniform random policy as well.
  • ...and 24 more figures