Efficient Deep Learning of Robust Policies from MPC using Imitation and Tube-Guided Data Augmentation
Andrea Tagliabue, Jonathan P. How
TL;DR
The paper tackles the data- and compute-inefficiency of imitation learning from expensive MPC demonstrations by introducing Sampling Augmentation (SA) that leverages Robust Tube MPC (RTMPC) to generate a tube of likely states under uncertainty. SA augments demonstrations with tube-guided state-action samples via the RTMPC ancillary controller, enabling efficient training and robust policy transfer to unseen disturbances, including sim2real. The authors develop both linear and nonlinear RTMPC extensions, using sensitivity-based data augmentation for the nonlinear case and a tube-guided data augmentation framework with optional fine-tuning, achieving zero-shot transfer from a single or few demonstrations and strong robustness in agile flight tasks. Experimental validation on a multirotor shows on-board inference at 15 microseconds and robust performance under wind and model errors, highlighting practical impact for data-efficient, robust policy learning in model-based control contexts.
Abstract
Imitation Learning (IL) can generate computationally efficient policies from demonstrations provided by Model Predictive Control (MPC). However, IL methods often require extensive data-collection and training efforts, limiting changes to the policy if the task changes, and they produce policies with limited robustness to new disturbances. In this work, we propose an IL strategy to efficiently compress a computationally expensive MPC into a deep neural network policy that is robust to previously unseen disturbances. By using a robust variant of the MPC, called Robust Tube MPC, and leveraging properties from the controller, we introduce computationally efficient data augmentation methods that enable a significant reduction of the number of MPC demonstrations and training efforts required to generate a robust policy. Our approach opens the possibility of zero-shot transfer of a policy trained from a single MPC demonstration collected in a nominal domain, such as a simulation or a robot in a lab/controlled environment, to a new domain with previously unseen bounded model errors/perturbations. Numerical evaluations performed using linear and nonlinear MPC for agile flight on a multirotor show that our method outperforms strategies commonly employed in IL (such as Dataset-Aggregation (DAgger) and Domain Randomization (DR)) in terms of demonstration-efficiency, training time, and robustness to perturbations unseen during training. Experimental evaluations validate the efficiency and real-world robustness.
