Table of Contents
Fetching ...

Reference Grounded Skill Discovery

Seungeun Rho, Aaron Trinh, Danfei Xu, Sehoon Ha

TL;DR

RGSD introduces Reference Grounded Skill Discovery, which grounds high-DoF skill exploration in a semantically meaningful latent space learned from reference motions. The method first performs contrastive pretraining to map references to unit-sphere directions, then jointly pursues imitation and discovery within this grounded space. On a 69-DoF SMPL humanoid, RGSD achieves high-fidelity imitation and discovers coherent variations, outperforming state-of-the-art unsupervised and imitation baselines on motion fidelity and downstream style-controlled tasks. The results suggest that lightweight reference grounding provides a practical path toward semantically rich, structured skills for complex agents, with implications for scalable skill foundations in robotics.

Abstract

Scaling unsupervised skill discovery algorithms to high-DoF agents remains challenging. As dimensionality increases, the exploration space grows exponentially, while the manifold of meaningful skills remains limited. Therefore, semantic meaningfulness becomes essential to effectively guide exploration in high-dimensional spaces. In this work, we present **Reference-Grounded Skill Discovery (RGSD)**, a novel algorithm that grounds skill discovery in a semantically meaningful latent space using reference data. RGSD first performs contrastive pretraining to embed motions on a unit hypersphere, clustering each reference trajectory into a distinct direction. This grounding enables skill discovery to simultaneously involve both imitation of reference behaviors and the discovery of semantically related diverse behaviors. On a simulated SMPL humanoid with $359$-D observations and $69$-D actions, RGSD successfully imitates skills such as walking, running, punching, and sidestepping, while also discover variations of these behaviors. In downstream locomotion tasks, RGSD leverages the discovered skills to faithfully satisfy user-specified style commands and outperforms imitation-learning baselines, which often fail to maintain the commanded style. Overall, our results suggest that lightweight reference-grounding offers a practical path to discovering semantically rich and structured skills in high-DoF systems.

Reference Grounded Skill Discovery

TL;DR

RGSD introduces Reference Grounded Skill Discovery, which grounds high-DoF skill exploration in a semantically meaningful latent space learned from reference motions. The method first performs contrastive pretraining to map references to unit-sphere directions, then jointly pursues imitation and discovery within this grounded space. On a 69-DoF SMPL humanoid, RGSD achieves high-fidelity imitation and discovers coherent variations, outperforming state-of-the-art unsupervised and imitation baselines on motion fidelity and downstream style-controlled tasks. The results suggest that lightweight reference grounding provides a practical path toward semantically rich, structured skills for complex agents, with implications for scalable skill foundations in robotics.

Abstract

Scaling unsupervised skill discovery algorithms to high-DoF agents remains challenging. As dimensionality increases, the exploration space grows exponentially, while the manifold of meaningful skills remains limited. Therefore, semantic meaningfulness becomes essential to effectively guide exploration in high-dimensional spaces. In this work, we present **Reference-Grounded Skill Discovery (RGSD)**, a novel algorithm that grounds skill discovery in a semantically meaningful latent space using reference data. RGSD first performs contrastive pretraining to embed motions on a unit hypersphere, clustering each reference trajectory into a distinct direction. This grounding enables skill discovery to simultaneously involve both imitation of reference behaviors and the discovery of semantically related diverse behaviors. On a simulated SMPL humanoid with -D observations and -D actions, RGSD successfully imitates skills such as walking, running, punching, and sidestepping, while also discover variations of these behaviors. In downstream locomotion tasks, RGSD leverages the discovered skills to faithfully satisfy user-specified style commands and outperforms imitation-learning baselines, which often fail to maintain the commanded style. Overall, our results suggest that lightweight reference-grounding offers a practical path to discovering semantically rich and structured skills in high-DoF systems.

Paper Structure

This paper contains 48 sections, 29 equations, 9 figures, 2 tables, 2 algorithms.

Figures (9)

  • Figure 1: Comparison of learned skills from METRA and RGSD. Our method can discover structured skills in high-DoF systems.
  • Figure 2: We present the overall training pipeline of RGSD. It starts with contrastive pretraining of an encoder using reference motions, followed by parallel training of imitation and discovery.
  • Figure 3: Example skills. RGSD generates diverse behaviors when conditioned on different latent vectors. The figure shows motions from a single policy conditioned on distinct latent vectors sampled near the embedding of the (a) running, (b) backward, (c) sidestepping, and (d) punching motion.
  • Figure 4: Top-view trajectories of the robot’s base when conditioned on latent vectors sampled from the neighborhood of each motion embedding. For each method, we visualize 150 trajectories.
  • Figure 5: Training curves for the downstream task, along with corresponding FID scores.
  • ...and 4 more figures