Data Efficient Behavior Cloning for Fine Manipulation via Continuity-based Corrective Labels

Abhay Deshpande; Liyiming Ke; Quinn Pfeifer; Abhishek Gupta; Siddhartha S. Srinivasa

Data Efficient Behavior Cloning for Fine Manipulation via Continuity-based Corrective Labels

Abhay Deshpande, Liyiming Ke, Quinn Pfeifer, Abhishek Gupta, Siddhartha S. Srinivasa

TL;DR

This work provides the first empirical validation that CCIL can significantly improve imitation learning performance despite discontinuities present in contact-rich manipulation and demonstrates CCIL’s practicality for alleviating compounding errors in imitation learning on physical robots.

Abstract

We consider imitation learning with access only to expert demonstrations, whose real-world application is often limited by covariate shift due to compounding errors during execution. We investigate the effectiveness of the Continuity-based Corrective Labels for Imitation Learning (CCIL) framework in mitigating this issue for real-world fine manipulation tasks. CCIL generates corrective labels by learning a locally continuous dynamics model from demonstrations to guide the agent back toward expert states. Through extensive experiments on peg insertion and fine grasping, we provide the first empirical validation that CCIL can significantly improve imitation learning performance despite discontinuities present in contact-rich manipulation. We find that: (1) real-world manipulation exhibits sufficient local smoothness to apply CCIL, (2) generated corrective labels are most beneficial in low-data regimes, and (3) label filtering based on estimated dynamics model error enables performance gains. To effectively apply CCIL to robotic domains, we offer a practical instantiation of the framework and insights into design choices and hyperparameter selection. Our work demonstrates CCIL's practicality for alleviating compounding errors in imitation learning on physical robots.

Data Efficient Behavior Cloning for Fine Manipulation via Continuity-based Corrective Labels

TL;DR

Abstract

Paper Structure (20 sections, 1 theorem, 9 equations, 6 figures, 1 algorithm)

This paper contains 20 sections, 1 theorem, 9 equations, 6 figures, 1 algorithm.

Introduction
Continuity-based corrective labels
Notation for Imitation Learning and Behavior Cloning
Continuity-based Corrective Labels
Our Practical Instantiation of CCIL
Experiment Design
Motivation
Hypotheses
Hardware
Tasks and Data Collection
Training
Evaluation
Results
Corrective Labels' Improvement to Imitation Learning
CCIL's Assumptions on Local Lipschitz Continuity
...and 5 more sections

Key Result

Theorem II.2

When the dynamics model has a bounded training error $\epsilon$ on the training data, if the learned dynamics $\hat{f}$ and true dynamics $f$ are respectively locally $K_1$ and $K_2$-Lipschitz within some neighborhood of $(s^*_t,a^*_t)$ of size $\delta$, and $\|s^\mathcal{G}_t-s^*_t\|<\delta$, then

Figures (6)

Figure 1: The three tasks we consider, GearInsertion, GraspCoin, and GraspCube, along with our three task objects: a coin, a Lego gear, and a small cube.
Figure 2: System overview. (a) Our HEBI-based 7-DOF robot with a chopstick end-effector. (b) Teleoperation mimicking leader chopsticks tracked using a motion-capture cage.
Figure 3: (a) As the amount of data increases, CCIL provides a smaller boost over behavior cloning. We use asterisks to denote statistical significance: * p $< 0.1$, ** $p < 0.05$, *** $p < 0.01$, and **** $p < 0.001$. $ns$ denotes $p \geq 0.1$. (b) Mean and middle 95% of the local Lipschitz coefficients of the learned dynamics model across the demonstration dataset as the Lipschitz constraint increases. As the enforced constraint increases, the distribution of coefficients converges.
Figure 4: (a) The 20% dataset for the GraspCube task reveals two distinct clusters of corrective labels. (b) The green cluster mainly corresponds to labels where the cube is being manipulated, and the blue cluster to arm-free space motion. (c) Policies trained using just the blue cluster (low label rejection threshold) are more successful compared to those trained with the green cluster or both (high label rejection threshold).
Figure 5: Policy performance when filtering out varying fractions of the generated labels due to label error.
...and 1 more figures

Theorems & Definitions (3)

Definition II.1: Local Lipschitz Continuity
Theorem II.2
Remark II.3

Data Efficient Behavior Cloning for Fine Manipulation via Continuity-based Corrective Labels

TL;DR

Abstract

Data Efficient Behavior Cloning for Fine Manipulation via Continuity-based Corrective Labels

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (3)