Learning Parameterized Skills from Demonstrations
Vedant Gupta, Haotian Fu, Calvin Luo, Yiding Jiang, George Konidaris
TL;DR
DEPS tackles long-horizon generalization by learning parameterized skills from demonstrations through a three-level policy hierarchy (discrete skill $oldsymbol{ ho}^K$, continuous parameter $oldsymbol{ ho}^Z$, and low-level $oldsymbol{ ho}^A$) guided by temporal variational inference over latent sequences $(oldsymbol{ ext{kappa}}, oldsymbol{ ext{zeta}})$. Skills are modeled as parameterized trajectory manifolds, with a 1D compressed state index $s'_t= anh(oldsymbol{w}_{(k,z)} cdot s^{ ext{proj}}_t+b_{(k,z)})$ that stimulates robust generalization and prevents overfitting to raw observations. Empirically, DEPS achieves state-of-the-art rapid generalization on LIBERO and MetaWorld, particularly in out-of-distribution and low-data regimes, and yields interpretable skills such as grasp-location parameterizations. The approach promises improved data efficiency and transferability for robotic manipulation, while offering a framework to analyze and visualize learned trajectory manifolds and skill structures.
Abstract
We present DEPS, an end-to-end algorithm for discovering parameterized skills from expert demonstrations. Our method learns parameterized skill policies jointly with a meta-policy that selects the appropriate discrete skill and continuous parameters at each timestep. Using a combination of temporal variational inference and information-theoretic regularization methods, we address the challenge of degeneracy common in latent variable models, ensuring that the learned skills are temporally extended, semantically meaningful, and adaptable. We empirically show that learning parameterized skills from multitask expert demonstrations significantly improves generalization to unseen tasks. Our method outperforms multitask as well as skill learning baselines on both LIBERO and MetaWorld benchmarks. We also demonstrate that DEPS discovers interpretable parameterized skills, such as an object grasping skill whose continuous arguments define the grasp location.
