Temporal Representations for Exploration: Learning Complex Exploratory Behavior without Extrinsic Rewards

Faisal Mohamed; Catherine Ji; Benjamin Eysenbach; Glen Berseth

Temporal Representations for Exploration: Learning Complex Exploratory Behavior without Extrinsic Rewards

Faisal Mohamed, Catherine Ji, Benjamin Eysenbach, Glen Berseth

TL;DR

This paper proposes an exploration method that leverages temporal contrastive representations to guide exploration, prioritizing states with unpredictable future outcomes and demonstrates that such representations can enable the learning of complex exploratory x in locomotion, manipulation, and embodied-AI tasks, revealing capabilities and behaviors that traditionally require extrinsic rewards.

Abstract

Effective exploration in reinforcement learning requires not only tracking where an agent has been, but also understanding how the agent perceives and represents the world. To learn powerful representations, an agent should actively explore states that contribute to its knowledge of the environment. Temporal representations can capture the information necessary to solve a wide range of potential tasks while avoiding the computational cost associated with full state reconstruction. In this paper, we propose an exploration method that leverages temporal contrastive representations to guide exploration, prioritizing states with unpredictable future outcomes. We demonstrate that such representations can enable the learning of complex exploratory x in locomotion, manipulation, and embodied-AI tasks, revealing capabilities and behaviors that traditionally require extrinsic rewards. Unlike approaches that rely on explicit distance learning or episodic memory mechanisms (e.g., quasimetric-based methods), our method builds directly on temporal similarities, yielding a simpler yet effective strategy for exploration.

Temporal Representations for Exploration: Learning Complex Exploratory Behavior without Extrinsic Rewards

TL;DR

Abstract

Paper Structure (43 sections, 26 equations, 29 figures, 8 tables, 1 algorithm)

This paper contains 43 sections, 26 equations, 29 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Background
Exploration via Temporal Contrastive Learning
Training the contrastive model
Extracting an exploration signal from the contrastive model
Interpretation of C-TeC
Information-Theoretic Expression of C-TeC
Representations are Necessary for C-TeC to Succeed
Experiments
comparison to ETD (Q1)
leveraging the future state distribution for exploration (Q2, Q3)
Learning complex behavior in Craftax-Classic
Conclusion
Usage of large language models (LLMs)
...and 28 more sections

Figures (29)

Figure 1: Curiosity-Driven Exploration via Temporal Contrastive Learning. We learn temporal representations so that the representation of $(s_0, a_0)$ is more similar to $( s_{2,3,4,\ldots})$. We reward the agent for visiting future states that seem unpredictable. For example, from state $s_0$, state $s_1$ should confer lower reward than the state $s_4$.
Figure 2: Environments. Maze coverage, robotic manipulation, and the survival game Craftax.
Figure 3: C-TeC Performance compared to ETD jiang2025episodic C-TeC is competitive to ETD in terms of state coverage in continuous control environments, and outperform ETD in Crafter.
Figure 4: Evolution of the C-TeC reward during training. This figure shows how the intrinsic reward changes over the course of training based on future state visitation. The black circle in the lower-left corner represents the starting state. C-TeC reward captures the agent's future state density and rewards the agent for visiting states faraway in the future.
Figure 5: C-TeC explores more states than prior methods. We compare the state coverage of C-TeC to APT liu2021behavior, RND burda2018exploration and ICM pathak2017curiosity. We include a uniform random policy as well.
...and 24 more figures

Temporal Representations for Exploration: Learning Complex Exploratory Behavior without Extrinsic Rewards

TL;DR

Abstract

Temporal Representations for Exploration: Learning Complex Exploratory Behavior without Extrinsic Rewards

Authors

TL;DR

Abstract

Table of Contents

Figures (29)