Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning
Chongyi Zheng, Jens Tuyls, Joanne Peng, Benjamin Eysenbach
TL;DR
The paper shows that METRA's success can be understood within the MISL framework, revealing that METRA's representation objective behaves like a contrastive loss and its actor objective aligns with an information bottleneck. Leveraging these insights, it introduces Contrastive Successor Features (CSF), a simpler MISL method that learns representations via a contrastive lower bound and trains a policy with successor features, avoiding METRA's dual-gradient representation optimization. Empirical results on six continuous-control tasks demonstrate CSF achieving near-state-of-the-art performance on exploration and downstream tasks, with ablations highlighting the importance of the information bottleneck-based intrinsic reward and the chosen representation parameterization. This work reinforces MISL as a viable foundation for unsupervised skill discovery and provides a practical, simpler alternative to Wasserstein-based MI approaches.
Abstract
Self-supervised learning has the potential of lifting several of the key challenges in reinforcement learning today, such as exploration, representation learning, and reward design. Recent work (METRA) has effectively argued that moving away from mutual information and instead optimizing a certain Wasserstein distance is important for good performance. In this paper, we argue that the benefits seen in that paper can largely be explained within the existing framework of mutual information skill learning (MISL). Our analysis suggests a new MISL method (contrastive successor features) that retains the excellent performance of METRA with fewer moving parts, and highlights connections between skill learning, contrastive representation learning, and successor features. Finally, through careful ablation studies, we provide further insight into some of the key ingredients for both our method and METRA.
