Stable-BC: Controlling Covariate Shift with Stable Behavior Cloning
Shaunak A. Mehta, Yusuf Umut Ciftci, Balamurugan Ramachandran, Somil Bansal, Dylan P. Losey
TL;DR
This work tackles covariate shift in behavior cloning by formulating the problem with error dynamics between current and demonstrated trajectories. By linearizing the error dynamics to obtain $\\dot z = A z$, it derives local stability conditions and introduces Stable-BC, which augments the standard BC loss with a stability term to encourage convergence toward expert behaviors. In model-based settings, the full $A$ matrix is used and stability is enforced via an eigenvalue-penalty; in model-free settings, bounded stability is achieved by controlling $A_1$ and minimizing $\\|A_2\\|$, yielding a data-efficient approach. Empirical results across interactive driving, nonlinear quadrotor navigation, visual perception tasks, and a real air hockey experiment show that Stable-BC improves robustness to covariate shift and can reduce the required demonstration data while producing smoother, more reliable policies.
Abstract
Behavior cloning is a common imitation learning paradigm. Under behavior cloning the robot collects expert demonstrations, and then trains a policy to match the actions taken by the expert. This works well when the robot learner visits states where the expert has already demonstrated the correct action; but inevitably the robot will also encounter new states outside of its training dataset. If the robot learner takes the wrong action at these new states it could move farther from the training data, which in turn leads to increasingly incorrect actions and compounding errors. Existing works try to address this fundamental challenge by augmenting or enhancing the training data. By contrast, in our paper we develop the control theoretic properties of behavior cloned policies. Specifically, we consider the error dynamics between the system's current state and the states in the expert dataset. From the error dynamics we derive model-based and model-free conditions for stability: under these conditions the robot shapes its policy so that its current behavior converges towards example behaviors in the expert dataset. In practice, this results in Stable-BC, an easy to implement extension of standard behavior cloning that is provably robust to covariate shift. We demonstrate the effectiveness of our algorithm in simulations with interactive, nonlinear, and visual environments. We also conduct experiments where a robot arm uses Stable-BC to play air hockey. See our website here: https://collab.me.vt.edu/Stable-BC/
