TAR: Teacher-Aligned Representations via Contrastive Learning for Quadrupedal Locomotion
Amr Mousa, Neil Karavis, Michele Caprio, Wei Pan, Richard Allmendinger
TL;DR
This work tackles generalization gaps in quadrupedal RL caused by misaligned privileged and proprioceptive representations and covariate shift. It proposes TAR, a framework that uses a privileged teacher to shape representations via a contrastive (triplet) objective, while the student learns through proprioceptive inputs and PPO, enabling efficient training and robust OOD generalization. A deployable fine-tuning path without privileged data allows real-world continual adaptation, validated by extensive simulation results and zero-shot hardware experiments on a Unitree Go2. TAR achieves faster training, better generalization than strong baselines, and practical deployability for real-world autonomous locomotion.
Abstract
Quadrupedal locomotion via Reinforcement Learning (RL) is commonly addressed using the teacher-student paradigm, where a privileged teacher guides a proprioceptive student policy. However, key challenges such as representation misalignment between privileged teacher and proprioceptive-only student, covariate shift due to behavioral cloning, and lack of deployable adaptation; lead to poor generalization in real-world scenarios. We propose Teacher-Aligned Representations via Contrastive Learning (TAR), a framework that leverages privileged information with self-supervised contrastive learning to bridge this gap. By aligning representations to a privileged teacher in simulation via contrastive objectives, our student policy learns structured latent spaces and exhibits robust generalization to Out-of-Distribution (OOD) scenarios, surpassing the fully privileged "Teacher". Results showed accelerated training by 2x compared to state-of-the-art baselines to achieve peak performance. OOD scenarios showed better generalization by 40% on average compared to existing methods. Moreover, TAR transitions seamlessly into learning during deployment without requiring privileged states, setting a new benchmark in sample-efficient, adaptive locomotion and enabling continual fine-tuning in real-world scenarios. Open-source code and videos are available at https://amrmousa.com/TARLoco/.
