Learning and Retrieval from Prior Data for Skill-based Imitation Learning
Soroush Nasiriany, Tian Gao, Ajay Mandlekar, Yuke Zhu
TL;DR
The paper tackles data-inefficient imitation learning for long-horizon robotic manipulation by introducing SAILOR, a two-phase framework that first learns a latent, predictable space of temporally extended skills from large prior datasets and then trains a target-task policy using retrieved, task-relevant prior sub-trajectories to augment supervision. A temporal-predictability objective shapes the skill space, while a retrieval-based mechanism selects relevant prior experiences to improve policy learning. Empirical results in simulated Franka Kitchen and CALVIN domains, plus real-world kitchen tasks, show substantial gains over behavioral cloning and offline RL baselines, with ablations highlighting the importance of both the predictability objective and data retrieval. The approach reduces target-task data requirements and demonstrates robust transfer from diverse prior data to new manipulation tasks, marking a step toward scalable, data-efficient robot learning.
Abstract
Imitation learning offers a promising path for robots to learn general-purpose behaviors, but traditionally has exhibited limited scalability due to high data supervision requirements and brittle generalization. Inspired by recent advances in multi-task imitation learning, we investigate the use of prior data from previous tasks to facilitate learning novel tasks in a robust, data-efficient manner. To make effective use of the prior data, the robot must internalize knowledge from past experiences and contextualize this knowledge in novel tasks. To that end, we develop a skill-based imitation learning framework that extracts temporally extended sensorimotor skills from prior data and subsequently learns a policy for the target task that invokes these learned skills. We identify several key design choices that significantly improve performance on novel tasks, namely representation learning objectives to enable more predictable skill representations and a retrieval-based data augmentation mechanism to increase the scope of supervision for policy training. On a collection of simulated and real-world manipulation domains, we demonstrate that our method significantly outperforms existing imitation learning and offline reinforcement learning approaches. Videos and code are available at https://ut-austin-rpl.github.io/sailor
