Table of Contents
Fetching ...

Learning and Retrieval from Prior Data for Skill-based Imitation Learning

Soroush Nasiriany, Tian Gao, Ajay Mandlekar, Yuke Zhu

TL;DR

The paper tackles data-inefficient imitation learning for long-horizon robotic manipulation by introducing SAILOR, a two-phase framework that first learns a latent, predictable space of temporally extended skills from large prior datasets and then trains a target-task policy using retrieved, task-relevant prior sub-trajectories to augment supervision. A temporal-predictability objective shapes the skill space, while a retrieval-based mechanism selects relevant prior experiences to improve policy learning. Empirical results in simulated Franka Kitchen and CALVIN domains, plus real-world kitchen tasks, show substantial gains over behavioral cloning and offline RL baselines, with ablations highlighting the importance of both the predictability objective and data retrieval. The approach reduces target-task data requirements and demonstrates robust transfer from diverse prior data to new manipulation tasks, marking a step toward scalable, data-efficient robot learning.

Abstract

Imitation learning offers a promising path for robots to learn general-purpose behaviors, but traditionally has exhibited limited scalability due to high data supervision requirements and brittle generalization. Inspired by recent advances in multi-task imitation learning, we investigate the use of prior data from previous tasks to facilitate learning novel tasks in a robust, data-efficient manner. To make effective use of the prior data, the robot must internalize knowledge from past experiences and contextualize this knowledge in novel tasks. To that end, we develop a skill-based imitation learning framework that extracts temporally extended sensorimotor skills from prior data and subsequently learns a policy for the target task that invokes these learned skills. We identify several key design choices that significantly improve performance on novel tasks, namely representation learning objectives to enable more predictable skill representations and a retrieval-based data augmentation mechanism to increase the scope of supervision for policy training. On a collection of simulated and real-world manipulation domains, we demonstrate that our method significantly outperforms existing imitation learning and offline reinforcement learning approaches. Videos and code are available at https://ut-austin-rpl.github.io/sailor

Learning and Retrieval from Prior Data for Skill-based Imitation Learning

TL;DR

The paper tackles data-inefficient imitation learning for long-horizon robotic manipulation by introducing SAILOR, a two-phase framework that first learns a latent, predictable space of temporally extended skills from large prior datasets and then trains a target-task policy using retrieved, task-relevant prior sub-trajectories to augment supervision. A temporal-predictability objective shapes the skill space, while a retrieval-based mechanism selects relevant prior experiences to improve policy learning. Empirical results in simulated Franka Kitchen and CALVIN domains, plus real-world kitchen tasks, show substantial gains over behavioral cloning and offline RL baselines, with ablations highlighting the importance of both the predictability objective and data retrieval. The approach reduces target-task data requirements and demonstrates robust transfer from diverse prior data to new manipulation tasks, marking a step toward scalable, data-efficient robot learning.

Abstract

Imitation learning offers a promising path for robots to learn general-purpose behaviors, but traditionally has exhibited limited scalability due to high data supervision requirements and brittle generalization. Inspired by recent advances in multi-task imitation learning, we investigate the use of prior data from previous tasks to facilitate learning novel tasks in a robust, data-efficient manner. To make effective use of the prior data, the robot must internalize knowledge from past experiences and contextualize this knowledge in novel tasks. To that end, we develop a skill-based imitation learning framework that extracts temporally extended sensorimotor skills from prior data and subsequently learns a policy for the target task that invokes these learned skills. We identify several key design choices that significantly improve performance on novel tasks, namely representation learning objectives to enable more predictable skill representations and a retrieval-based data augmentation mechanism to increase the scope of supervision for policy training. On a collection of simulated and real-world manipulation domains, we demonstrate that our method significantly outperforms existing imitation learning and offline reinforcement learning approaches. Videos and code are available at https://ut-austin-rpl.github.io/sailor
Paper Structure (30 sections, 4 equations, 4 figures, 9 tables, 2 algorithms)

This paper contains 30 sections, 4 equations, 4 figures, 9 tables, 2 algorithms.

Figures (4)

  • Figure 1: Overview. We present a skill-based imitation learning framework that uses prior data to effectively learn novel tasks. First, we learn a latent skill model on the prior data, with objectives to ensure a predictable skill representation. Given target task demonstrations, we use this latent space to retrieve similar behaviors from the prior data, expanding supervision for the policy. We then train a policy which outputs latent skills.
  • Figure 2: Model Overview. Our method consists of a skill learning and policy learning phase. (Left) In the skill learning phase, we learn a latent skill representation of sub-trajectories via a variational autoencoder. We include an additional temporal predictability term to learn a more consistent latent representation. (Right) In the policy learning phase, we train the policy to predict the latent skill given a history of observations preceding the sub-trajectory. To execute the policy, we decode the predicted latent using the skill decoder.
  • Figure 3: Simulated Tasks. We perform extensive evaluations on two simulation domains. (Left) Franka Kitchen: our target task involves a specific permutation of four subtasks and we consider two prior datasets: demonstrations involving all subtasks and demonstrations involving all subtasks except opening the microwave. (Right) CALVIN: we adopt the play dataset of mees2021calvin as our prior data and perform evaluations on two target tasks: setting up the playroom environment and, conversely, cleaning up the environment.
  • Figure 4: Real World Tasks. On the left, we illustrate the set of objects we use for collecting the play dataset. On the right shows two of three target tasks, setting up breakfast and cooking.