Decoupling Representation Learning from Reinforcement Learning
Adam Stooke, Kimin Lee, Pieter Abbeel, Michael Laskin
TL;DR
ATC introduces reward-free, contrastive representation learning that decouples encoder training from policy optimization in vision-based RL. The approach uses augmented temporal pairs and a momentum-contrastive setup to learn encodings that generalize across tasks and domains, often matching or surpassing end-to-end RL and other UL baselines. Across DMControl, DMLab, and Atari, ATC demonstrates strong online performance, competitive offline pretraining, and partial transfer in multi-task settings, with ablations clarifying the roles of augmentation and temporal structure. This decoupled paradigm offers practical benefits for scalable, reusable representations in RL, including improved efficiency and flexibility for batch/offline contexts.
Abstract
In an effort to overcome limitations of reward-driven feature learning in deep reinforcement learning (RL) from images, we propose decoupling representation learning from policy learning. To this end, we introduce a new unsupervised learning (UL) task, called Augmented Temporal Contrast (ATC), which trains a convolutional encoder to associate pairs of observations separated by a short time difference, under image augmentations and using a contrastive loss. In online RL experiments, we show that training the encoder exclusively using ATC matches or outperforms end-to-end RL in most environments. Additionally, we benchmark several leading UL algorithms by pre-training encoders on expert demonstrations and using them, with weights frozen, in RL agents; we find that agents using ATC-trained encoders outperform all others. We also train multi-task encoders on data from multiple environments and show generalization to different downstream RL tasks. Finally, we ablate components of ATC, and introduce a new data augmentation to enable replay of (compressed) latent images from pre-trained encoders when RL requires augmentation. Our experiments span visually diverse RL benchmarks in DeepMind Control, DeepMind Lab, and Atari, and our complete code is available at https://github.com/astooke/rlpyt/tree/master/rlpyt/ul.
