Table of Contents
Fetching ...

PRIME: Scaffolding Manipulation Tasks with Behavior Primitives for Data-Efficient Imitation Learning

Tian Gao, Soroush Nasiriany, Huihan Liu, Quantao Yang, Yuke Zhu

TL;DR

PRIME tackles the data inefficiency of imitation learning for long-horizon manipulation by scaffolding tasks with a fixed set of behavior primitives and learning a high-level primitive-sequencing policy. A self-supervised data collection regime trains an inverse dynamics model (IDM) to map state pairs to primitives, while a trajectory parser using dynamic programming converts demonstrations into primitive sequences without segmentation labels. The policy is learned through imitation on parsed sequences, aided by suffix-based data augmentation and pretraining on IDM data. In simulation and on real robots, PRIME achieves substantial performance gains over state-of-the-art baselines and demonstrates strong generalization and recovery capabilities, though real-world sim2real gaps remain a challenge. Overall, PRIME provides a practical, data-efficient framework for scalable, primitive-based imitation in tabletop manipulation, with promising directions for expanding its primitive library and applying curriculum learning.

Abstract

Imitation learning has shown great potential for enabling robots to acquire complex manipulation behaviors. However, these algorithms suffer from high sample complexity in long-horizon tasks, where compounding errors accumulate over the task horizons. We present PRIME (PRimitive-based IMitation with data Efficiency), a behavior primitive-based framework designed for improving the data efficiency of imitation learning. PRIME scaffolds robot tasks by decomposing task demonstrations into primitive sequences, followed by learning a high-level control policy to sequence primitives through imitation learning. Our experiments demonstrate that PRIME achieves a significant performance improvement in multi-stage manipulation tasks, with 10-34% higher success rates in simulation over state-of-the-art baselines and 20-48% on physical hardware.

PRIME: Scaffolding Manipulation Tasks with Behavior Primitives for Data-Efficient Imitation Learning

TL;DR

PRIME tackles the data inefficiency of imitation learning for long-horizon manipulation by scaffolding tasks with a fixed set of behavior primitives and learning a high-level primitive-sequencing policy. A self-supervised data collection regime trains an inverse dynamics model (IDM) to map state pairs to primitives, while a trajectory parser using dynamic programming converts demonstrations into primitive sequences without segmentation labels. The policy is learned through imitation on parsed sequences, aided by suffix-based data augmentation and pretraining on IDM data. In simulation and on real robots, PRIME achieves substantial performance gains over state-of-the-art baselines and demonstrates strong generalization and recovery capabilities, though real-world sim2real gaps remain a challenge. Overall, PRIME provides a practical, data-efficient framework for scalable, primitive-based imitation in tabletop manipulation, with promising directions for expanding its primitive library and applying curriculum learning.

Abstract

Imitation learning has shown great potential for enabling robots to acquire complex manipulation behaviors. However, these algorithms suffer from high sample complexity in long-horizon tasks, where compounding errors accumulate over the task horizons. We present PRIME (PRimitive-based IMitation with data Efficiency), a behavior primitive-based framework designed for improving the data efficiency of imitation learning. PRIME scaffolds robot tasks by decomposing task demonstrations into primitive sequences, followed by learning a high-level control policy to sequence primitives through imitation learning. Our experiments demonstrate that PRIME achieves a significant performance improvement in multi-stage manipulation tasks, with 10-34% higher success rates in simulation over state-of-the-art baselines and 20-48% on physical hardware.
Paper Structure (23 sections, 2 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 23 sections, 2 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of PRIME. (Left) Our learning framework leverages a set of pre-built behavior primitives to scaffold manipulation tasks. (Middle) Given task demonstrations, we use a trajectory parser to parse each demonstration into a sequence of primitive types (such as "push", "grasp" and "place") and their corresponding parameters $x_i$. (Right) With these parsed sequences of primitives, we use imitation learning to acquire a policy capable of predicting primitive types (such as "grasp") and corresponding parameters $x$ based on observations.
  • Figure 2: Method Overview. We develop a self-supervised data collection procedure that randomly executes sequences of behavior primitives in the environment. With the generated dataset, we train an IDM that maps an initial state $s$ and a final state $s'$ from segments in task demonstrations to a primitive type $p$ and corresponding parameters $x$. To derive the optimal primitive sequences, we build a trajectory parser capable of parsing task demonstrations into primitive sequences using the learned IDM. Finally, we train the policy using parsed primitive sequences.
  • Figure 3: Simulated Tasks. We perform evaluations on three tasks from the RoboSuite simulator zhu2020robosuite. The first two, PickPlace and NutAssembly, are from the RoboSuite benchmark, with NutAssembly featuring less initial randomization than the original task. We introduce a third task, TidyUp, to study long-horizon tasks and test the inverse dynamics model's generalization to unseen environments. We create four environment variants in this domain, denoted as (A, B, C, D). TidyUp task is designed in environment (D), and we collect human demonstrations for TidyUp in the same environment (D). To gauge the inverse dynamics model's generalization capability, we train two IDMs: IDM-D, based solely on data from environment (D), and IDM-ABC, trained on data from environments (A, B, C). While IDM-D is our default model for experiments, we use IDM-ABC to evaluate generalization in unseen environments.
  • Figure 4: Quantitative evaluation in three simulated tasks. Our method significantly outperforms state-of-the-art imitation learning approaches, with success rates surpassing 95% in all three tasks.
  • Figure 5: Visualization of output primitive sequences from trajectory parser. For each task, we select five human demonstrations and visualize the segmented primitive sequences as interpreted by the trajectory parser.
  • ...and 1 more figures