Reference Grounded Skill Discovery
Seungeun Rho, Aaron Trinh, Danfei Xu, Sehoon Ha
TL;DR
RGSD introduces Reference Grounded Skill Discovery, which grounds high-DoF skill exploration in a semantically meaningful latent space learned from reference motions. The method first performs contrastive pretraining to map references to unit-sphere directions, then jointly pursues imitation and discovery within this grounded space. On a 69-DoF SMPL humanoid, RGSD achieves high-fidelity imitation and discovers coherent variations, outperforming state-of-the-art unsupervised and imitation baselines on motion fidelity and downstream style-controlled tasks. The results suggest that lightweight reference grounding provides a practical path toward semantically rich, structured skills for complex agents, with implications for scalable skill foundations in robotics.
Abstract
Scaling unsupervised skill discovery algorithms to high-DoF agents remains challenging. As dimensionality increases, the exploration space grows exponentially, while the manifold of meaningful skills remains limited. Therefore, semantic meaningfulness becomes essential to effectively guide exploration in high-dimensional spaces. In this work, we present **Reference-Grounded Skill Discovery (RGSD)**, a novel algorithm that grounds skill discovery in a semantically meaningful latent space using reference data. RGSD first performs contrastive pretraining to embed motions on a unit hypersphere, clustering each reference trajectory into a distinct direction. This grounding enables skill discovery to simultaneously involve both imitation of reference behaviors and the discovery of semantically related diverse behaviors. On a simulated SMPL humanoid with $359$-D observations and $69$-D actions, RGSD successfully imitates skills such as walking, running, punching, and sidestepping, while also discover variations of these behaviors. In downstream locomotion tasks, RGSD leverages the discovered skills to faithfully satisfy user-specified style commands and outperforms imitation-learning baselines, which often fail to maintain the commanded style. Overall, our results suggest that lightweight reference-grounding offers a practical path to discovering semantically rich and structured skills in high-DoF systems.
