Table of Contents
Fetching ...

Memory-Consistent Neural Networks for Imitation Learning

Kaustubh Sridhar, Souradeep Dutta, Dinesh Jayaraman, James Weimer, Insup Lee

TL;DR

This work revisits simple supervised ``behavior cloning'' for conveniently training the policy from nothing more than pre-recorded demonstrations, but carefully design the model class to counter the compounding error phenomenon, and provides a guaranteed upper bound for the sub-optimality gap induced by MCNN policies.

Abstract

Imitation learning considerably simplifies policy synthesis compared to alternative approaches by exploiting access to expert demonstrations. For such imitation policies, errors away from the training samples are particularly critical. Even rare slip-ups in the policy action outputs can compound quickly over time, since they lead to unfamiliar future states where the policy is still more likely to err, eventually causing task failures. We revisit simple supervised ``behavior cloning'' for conveniently training the policy from nothing more than pre-recorded demonstrations, but carefully design the model class to counter the compounding error phenomenon. Our ``memory-consistent neural network'' (MCNN) outputs are hard-constrained to stay within clearly specified permissible regions anchored to prototypical ``memory'' training samples. We provide a guaranteed upper bound for the sub-optimality gap induced by MCNN policies. Using MCNNs on 10 imitation learning tasks, with MLP, Transformer, and Diffusion backbones, spanning dexterous robotic manipulation and driving, proprioceptive inputs and visual inputs, and varying sizes and types of demonstration data, we find large and consistent gains in performance, validating that MCNNs are better-suited than vanilla deep neural networks for imitation learning applications. Website: https://sites.google.com/view/mcnn-imitation

Memory-Consistent Neural Networks for Imitation Learning

TL;DR

This work revisits simple supervised ``behavior cloning'' for conveniently training the policy from nothing more than pre-recorded demonstrations, but carefully design the model class to counter the compounding error phenomenon, and provides a guaranteed upper bound for the sub-optimality gap induced by MCNN policies.

Abstract

Imitation learning considerably simplifies policy synthesis compared to alternative approaches by exploiting access to expert demonstrations. For such imitation policies, errors away from the training samples are particularly critical. Even rare slip-ups in the policy action outputs can compound quickly over time, since they lead to unfamiliar future states where the policy is still more likely to err, eventually causing task failures. We revisit simple supervised ``behavior cloning'' for conveniently training the policy from nothing more than pre-recorded demonstrations, but carefully design the model class to counter the compounding error phenomenon. Our ``memory-consistent neural network'' (MCNN) outputs are hard-constrained to stay within clearly specified permissible regions anchored to prototypical ``memory'' training samples. We provide a guaranteed upper bound for the sub-optimality gap induced by MCNN policies. Using MCNNs on 10 imitation learning tasks, with MLP, Transformer, and Diffusion backbones, spanning dexterous robotic manipulation and driving, proprioceptive inputs and visual inputs, and varying sizes and types of demonstration data, we find large and consistent gains in performance, validating that MCNNs are better-suited than vanilla deep neural networks for imitation learning applications. Website: https://sites.google.com/view/mcnn-imitation
Paper Structure (21 sections, 4 theorems, 6 equations, 15 figures, 6 tables, 2 algorithms)

This paper contains 21 sections, 4 theorems, 6 equations, 15 figures, 6 tables, 2 algorithms.

Key Result

Lemma 4.5

Assume two sets of memory code-books $\mathcal{B}_i,\; \mathcal{B}_j$, such that $\mathcal{B}_i \subseteq \mathcal{B}_j$, then $d^I_{\mathcal{B}_i\vert_S} \geq d^I_{\mathcal{B}_j\vert_S}$

Figures (15)

  • Figure 1: MCNN significantly improves performance on realistic demonstration datasets. We plot the percentage increase in return with MCNN over D4RL BC fu2020d4rl for various number of demonstrations across many tasks. In this plot, each point is a separate MCNN policy. We see significant improvements in the few demonstrations regime where most realistic imitation learning tasks can be found. The choice of model class is crucial in such regimes and MCNN shines. Additional details are in Appendix \ref{['app:more_details']}.
  • Figure 2: The elements of the MCNN model class. In the top row, the left panel shows the nearest memory neighbour component with memories subsampled from the training dataset shown in red circles. The middle panel depicts the constrained neural network function class, where the blue shaded regions represent the permissible regions; by design, the function cannot take values outside these shaded regions. Finally, the right panel shows the combined MCNN model class. The size of the permissible regions can be modulated by increasing $\lambda$ (bottom left) or by decreasing the number of memories (bottom right). The second row shows many such MCNN model families with increasing capacity. For additional plots, see Appendix \ref{['app:example_function_classes']}.
  • Figure 3: The environments here include: Adroit Pen, Hammer, Relocate, and Door rajeswaran2017adroit, CARLA's Town03 and Town04 carla, and Franka Kitchen gupta2019relay_policy_learning. The four Adroit environments and Franka Kitchen have proprioceptive observations and the CARLA environment has image observations.
  • Figure 4: Adroit human tasks [25 demos]: Comparison of returns (across 20 evaluation trajectories and 3 random seeds) between baselines and our methods (MCNN+BeT, MCNN+Diff, and MCNN+MLP). Our MCNN methods use the same fixed set of hyperparameters across all tasks.
  • Figure 5: Adroit expert tasks [5000 demos]: Comparison of returns (across 20 evaluation trajectories and 3 random seeds) between baselines and our methods (MCNN+BeT, MCNN+Diff, and MCNN+MLP). Our MCNN methods use the same fixed set of hyperparameters across all tasks.
  • ...and 10 more figures

Theorems & Definitions (8)

  • Definition 4.1: Nearest Memory Neighbor Function
  • Definition 4.2: Memory-Consistent Neural Network
  • Definition 4.4: Most Isolated State
  • Lemma 4.5
  • Lemma 4.6
  • Theorem 4.7
  • Corollary 4.8
  • Definition 4.9: Neural Gas