TLDR: Unsupervised Goal-Conditioned RL via Temporal Distance-Aware Representations
Junik Bae, Kwanyoung Park, Youngwoon Lee
TL;DR
This work tackles unsupervised goal-conditioned reinforcement learning by addressing limited state coverage and long-horizon goal-reaching challenges. It introduces TLDR, a framework that learns temporal distance-aware representations to guide exploratory goal selection, intrinsic rewards, and the goal-conditioned policy within a Go-Explore-inspired setup. Empirical results across six state-based and two pixel-based locomotion tasks show that TLDR achieves substantially broader state coverage and robust goal-reaching, with ablations confirming the value of temporal-distance signals for both exploration and learning. Limitations include slower learning in pixel-based environments and potential safety considerations for real robots, suggesting avenues for future work in representation learning, model-based enhancements, and safety-aware deployment.
Abstract
Unsupervised goal-conditioned reinforcement learning (GCRL) is a promising paradigm for developing diverse robotic skills without external supervision. However, existing unsupervised GCRL methods often struggle to cover a wide range of states in complex environments due to their limited exploration and sparse or noisy rewards for GCRL. To overcome these challenges, we propose a novel unsupervised GCRL method that leverages TemporaL Distance-aware Representations (TLDR). Based on temporal distance, TLDR selects faraway goals to initiate exploration and computes intrinsic exploration rewards and goal-reaching rewards. Specifically, our exploration policy seeks states with large temporal distances (i.e. covering a large state space), while the goal-conditioned policy learns to minimize the temporal distance to the goal (i.e. reaching the goal). Our results in six simulated locomotion environments demonstrate that TLDR significantly outperforms prior unsupervised GCRL methods in achieving a wide range of states.
