Table of Contents
Fetching ...

Learning Parameterized Skills from Demonstrations

Vedant Gupta, Haotian Fu, Calvin Luo, Yiding Jiang, George Konidaris

TL;DR

DEPS tackles long-horizon generalization by learning parameterized skills from demonstrations through a three-level policy hierarchy (discrete skill $oldsymbol{ ho}^K$, continuous parameter $oldsymbol{ ho}^Z$, and low-level $oldsymbol{ ho}^A$) guided by temporal variational inference over latent sequences $(oldsymbol{ ext{kappa}}, oldsymbol{ ext{zeta}})$. Skills are modeled as parameterized trajectory manifolds, with a 1D compressed state index $s'_t= anh(oldsymbol{w}_{(k,z)} cdot s^{ ext{proj}}_t+b_{(k,z)})$ that stimulates robust generalization and prevents overfitting to raw observations. Empirically, DEPS achieves state-of-the-art rapid generalization on LIBERO and MetaWorld, particularly in out-of-distribution and low-data regimes, and yields interpretable skills such as grasp-location parameterizations. The approach promises improved data efficiency and transferability for robotic manipulation, while offering a framework to analyze and visualize learned trajectory manifolds and skill structures.

Abstract

We present DEPS, an end-to-end algorithm for discovering parameterized skills from expert demonstrations. Our method learns parameterized skill policies jointly with a meta-policy that selects the appropriate discrete skill and continuous parameters at each timestep. Using a combination of temporal variational inference and information-theoretic regularization methods, we address the challenge of degeneracy common in latent variable models, ensuring that the learned skills are temporally extended, semantically meaningful, and adaptable. We empirically show that learning parameterized skills from multitask expert demonstrations significantly improves generalization to unseen tasks. Our method outperforms multitask as well as skill learning baselines on both LIBERO and MetaWorld benchmarks. We also demonstrate that DEPS discovers interpretable parameterized skills, such as an object grasping skill whose continuous arguments define the grasp location.

Learning Parameterized Skills from Demonstrations

TL;DR

DEPS tackles long-horizon generalization by learning parameterized skills from demonstrations through a three-level policy hierarchy (discrete skill , continuous parameter , and low-level ) guided by temporal variational inference over latent sequences . Skills are modeled as parameterized trajectory manifolds, with a 1D compressed state index that stimulates robust generalization and prevents overfitting to raw observations. Empirically, DEPS achieves state-of-the-art rapid generalization on LIBERO and MetaWorld, particularly in out-of-distribution and low-data regimes, and yields interpretable skills such as grasp-location parameterizations. The approach promises improved data efficiency and transferability for robotic manipulation, while offering a framework to analyze and visualize learned trajectory manifolds and skill structures.

Abstract

We present DEPS, an end-to-end algorithm for discovering parameterized skills from expert demonstrations. Our method learns parameterized skill policies jointly with a meta-policy that selects the appropriate discrete skill and continuous parameters at each timestep. Using a combination of temporal variational inference and information-theoretic regularization methods, we address the challenge of degeneracy common in latent variable models, ensuring that the learned skills are temporally extended, semantically meaningful, and adaptable. We empirically show that learning parameterized skills from multitask expert demonstrations significantly improves generalization to unseen tasks. Our method outperforms multitask as well as skill learning baselines on both LIBERO and MetaWorld benchmarks. We also demonstrate that DEPS discovers interpretable parameterized skills, such as an object grasping skill whose continuous arguments define the grasp location.

Paper Structure

This paper contains 45 sections, 6 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: Three-level hierarchy of DEPS. The discrete skill policy selects a skill from the library given the full environment observation. Conditioned on that choice, the continuous‑parameter policy outputs continuous parameters that modulate the chosen skill, tracing a trajectory manifold (illustrated on the left). Finally, the low‑level action policy, which sees only a compressed one‑dimensional robot state, produces the primitive action.
  • Figure 1: Average success rate across evaluation settings on LIBERO and MetaWorld-v2. All results are averaged across 5 seeds.
  • Figure 2: The underlying probabilistic graphical model of Deps. The variational encoder has access to all the information of the trajectory from history to future. The discrete and continuous policy works as the high level policy that infers the parameterized skills based on information from previous timesteps. The low-level subpolicy infers actions based on the parameterized skills as well as the current state. Variables observable by each model are shaded in gray.
  • Figure 3: Skills as Parameterized Trajectory Manifolds. We hypothesize that a single skill corresponds to a family of parameterized trajectories. A one-dimensional state representation indexes into this generalizable manifold to predict actions.
  • Figure 4: Images of example tasks from LIBERO
  • ...and 4 more figures