Memory-Consistent Neural Networks for Imitation Learning

Kaustubh Sridhar; Souradeep Dutta; Dinesh Jayaraman; James Weimer; Insup Lee

Memory-Consistent Neural Networks for Imitation Learning

Kaustubh Sridhar, Souradeep Dutta, Dinesh Jayaraman, James Weimer, Insup Lee

TL;DR

This work revisits simple supervised ``behavior cloning'' for conveniently training the policy from nothing more than pre-recorded demonstrations, but carefully design the model class to counter the compounding error phenomenon, and provides a guaranteed upper bound for the sub-optimality gap induced by MCNN policies.

Abstract

Imitation learning considerably simplifies policy synthesis compared to alternative approaches by exploiting access to expert demonstrations. For such imitation policies, errors away from the training samples are particularly critical. Even rare slip-ups in the policy action outputs can compound quickly over time, since they lead to unfamiliar future states where the policy is still more likely to err, eventually causing task failures. We revisit simple supervised ``behavior cloning'' for conveniently training the policy from nothing more than pre-recorded demonstrations, but carefully design the model class to counter the compounding error phenomenon. Our ``memory-consistent neural network'' (MCNN) outputs are hard-constrained to stay within clearly specified permissible regions anchored to prototypical ``memory'' training samples. We provide a guaranteed upper bound for the sub-optimality gap induced by MCNN policies. Using MCNNs on 10 imitation learning tasks, with MLP, Transformer, and Diffusion backbones, spanning dexterous robotic manipulation and driving, proprioceptive inputs and visual inputs, and varying sizes and types of demonstration data, we find large and consistent gains in performance, validating that MCNNs are better-suited than vanilla deep neural networks for imitation learning applications. Website: https://sites.google.com/view/mcnn-imitation

Memory-Consistent Neural Networks for Imitation Learning

TL;DR

Abstract

Paper Structure (21 sections, 4 theorems, 6 equations, 15 figures, 6 tables, 2 algorithms)

This paper contains 21 sections, 4 theorems, 6 equations, 15 figures, 6 tables, 2 algorithms.

Introduction
Related Work
Problem Formulation
Approach
The Model Class: Memory-Consistent Neural Networks
Theoretical Analysis of MCNNs for Imitation Learning
Algorithm: Imitation Learning With MCNN Policies
Neural gas.
Learning memories.
Training MCNNs.
Experimental Evaluation
Conclusions and Limitations.
Acknowledgements:
Reproducibility Statement:
Examples of the MCNN Function Classes
...and 6 more sections

Key Result

Lemma 4.5

Assume two sets of memory code-books $\mathcal{B}_i,\; \mathcal{B}_j$, such that $\mathcal{B}_i \subseteq \mathcal{B}_j$, then $d^I_{\mathcal{B}_i\vert_S} \geq d^I_{\mathcal{B}_j\vert_S}$

Figures (15)

Figure 1: MCNN significantly improves performance on realistic demonstration datasets. We plot the percentage increase in return with MCNN over D4RL BC fu2020d4rl for various number of demonstrations across many tasks. In this plot, each point is a separate MCNN policy. We see significant improvements in the few demonstrations regime where most realistic imitation learning tasks can be found. The choice of model class is crucial in such regimes and MCNN shines. Additional details are in Appendix \ref{['app:more_details']}.
Figure 2: The elements of the MCNN model class. In the top row, the left panel shows the nearest memory neighbour component with memories subsampled from the training dataset shown in red circles. The middle panel depicts the constrained neural network function class, where the blue shaded regions represent the permissible regions; by design, the function cannot take values outside these shaded regions. Finally, the right panel shows the combined MCNN model class. The size of the permissible regions can be modulated by increasing $\lambda$ (bottom left) or by decreasing the number of memories (bottom right). The second row shows many such MCNN model families with increasing capacity. For additional plots, see Appendix \ref{['app:example_function_classes']}.
Figure 3: The environments here include: Adroit Pen, Hammer, Relocate, and Door rajeswaran2017adroit, CARLA's Town03 and Town04 carla, and Franka Kitchen gupta2019relay_policy_learning. The four Adroit environments and Franka Kitchen have proprioceptive observations and the CARLA environment has image observations.
Figure 4: Adroit human tasks [25 demos]: Comparison of returns (across 20 evaluation trajectories and 3 random seeds) between baselines and our methods (MCNN+BeT, MCNN+Diff, and MCNN+MLP). Our MCNN methods use the same fixed set of hyperparameters across all tasks.
Figure 5: Adroit expert tasks [5000 demos]: Comparison of returns (across 20 evaluation trajectories and 3 random seeds) between baselines and our methods (MCNN+BeT, MCNN+Diff, and MCNN+MLP). Our MCNN methods use the same fixed set of hyperparameters across all tasks.
...and 10 more figures

Theorems & Definitions (8)

Definition 4.1: Nearest Memory Neighbor Function
Definition 4.2: Memory-Consistent Neural Network
Definition 4.4: Most Isolated State
Lemma 4.5
Lemma 4.6
Theorem 4.7
Corollary 4.8
Definition 4.9: Neural Gas

Memory-Consistent Neural Networks for Imitation Learning

TL;DR

Abstract

Memory-Consistent Neural Networks for Imitation Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (8)