Reinforcement Learning with Prototypical Representations
Denis Yarats, Rob Fergus, Alessandro Lazaric, Lerrel Pinto
TL;DR
Proto-RL tackles the core RL challenge of learning effective representations from images by coupling prototypical latent representations with an intrinsic, entropy-based exploration signal. It pretrains an encoder and a library of prototypes in a task-agnostic phase, using a SwAV-inspired clustering objective and a nearest-neighbor entropy estimator in latent space to drive exploration. The learned representations generalize to unseen downstream DM Control Suite tasks, enabling faster, more robust policy learning, particularly in sparse-reward settings, with improved state-space coverage. This approach demonstrates that task-agnostic prototypical representations can significantly enhance downstream exploration and sample efficiency, offering a practical route toward more generalizable, fine-tunable RL systems.
Abstract
Learning effective representations in image-based environments is crucial for sample efficient Reinforcement Learning (RL). Unfortunately, in RL, representation learning is confounded with the exploratory experience of the agent -- learning a useful representation requires diverse data, while effective exploration is only possible with coherent representations. Furthermore, we would like to learn representations that not only generalize across tasks but also accelerate downstream exploration for efficient task-specific training. To address these challenges we propose Proto-RL, a self-supervised framework that ties representation learning with exploration through prototypical representations. These prototypes simultaneously serve as a summarization of the exploratory experience of an agent as well as a basis for representing observations. We pre-train these task-agnostic representations and prototypes on environments without downstream task information. This enables state-of-the-art downstream policy learning on a set of difficult continuous control tasks.
