Table of Contents
Fetching ...

Efficient Deep Learning of Robust Policies from MPC using Imitation and Tube-Guided Data Augmentation

Andrea Tagliabue, Jonathan P. How

TL;DR

The paper tackles the data- and compute-inefficiency of imitation learning from expensive MPC demonstrations by introducing Sampling Augmentation (SA) that leverages Robust Tube MPC (RTMPC) to generate a tube of likely states under uncertainty. SA augments demonstrations with tube-guided state-action samples via the RTMPC ancillary controller, enabling efficient training and robust policy transfer to unseen disturbances, including sim2real. The authors develop both linear and nonlinear RTMPC extensions, using sensitivity-based data augmentation for the nonlinear case and a tube-guided data augmentation framework with optional fine-tuning, achieving zero-shot transfer from a single or few demonstrations and strong robustness in agile flight tasks. Experimental validation on a multirotor shows on-board inference at 15 microseconds and robust performance under wind and model errors, highlighting practical impact for data-efficient, robust policy learning in model-based control contexts.

Abstract

Imitation Learning (IL) can generate computationally efficient policies from demonstrations provided by Model Predictive Control (MPC). However, IL methods often require extensive data-collection and training efforts, limiting changes to the policy if the task changes, and they produce policies with limited robustness to new disturbances. In this work, we propose an IL strategy to efficiently compress a computationally expensive MPC into a deep neural network policy that is robust to previously unseen disturbances. By using a robust variant of the MPC, called Robust Tube MPC, and leveraging properties from the controller, we introduce computationally efficient data augmentation methods that enable a significant reduction of the number of MPC demonstrations and training efforts required to generate a robust policy. Our approach opens the possibility of zero-shot transfer of a policy trained from a single MPC demonstration collected in a nominal domain, such as a simulation or a robot in a lab/controlled environment, to a new domain with previously unseen bounded model errors/perturbations. Numerical evaluations performed using linear and nonlinear MPC for agile flight on a multirotor show that our method outperforms strategies commonly employed in IL (such as Dataset-Aggregation (DAgger) and Domain Randomization (DR)) in terms of demonstration-efficiency, training time, and robustness to perturbations unseen during training. Experimental evaluations validate the efficiency and real-world robustness.

Efficient Deep Learning of Robust Policies from MPC using Imitation and Tube-Guided Data Augmentation

TL;DR

The paper tackles the data- and compute-inefficiency of imitation learning from expensive MPC demonstrations by introducing Sampling Augmentation (SA) that leverages Robust Tube MPC (RTMPC) to generate a tube of likely states under uncertainty. SA augments demonstrations with tube-guided state-action samples via the RTMPC ancillary controller, enabling efficient training and robust policy transfer to unseen disturbances, including sim2real. The authors develop both linear and nonlinear RTMPC extensions, using sensitivity-based data augmentation for the nonlinear case and a tube-guided data augmentation framework with optional fine-tuning, achieving zero-shot transfer from a single or few demonstrations and strong robustness in agile flight tasks. Experimental validation on a multirotor shows on-board inference at 15 microseconds and robust performance under wind and model errors, highlighting practical impact for data-efficient, robust policy learning in model-based control contexts.

Abstract

Imitation Learning (IL) can generate computationally efficient policies from demonstrations provided by Model Predictive Control (MPC). However, IL methods often require extensive data-collection and training efforts, limiting changes to the policy if the task changes, and they produce policies with limited robustness to new disturbances. In this work, we propose an IL strategy to efficiently compress a computationally expensive MPC into a deep neural network policy that is robust to previously unseen disturbances. By using a robust variant of the MPC, called Robust Tube MPC, and leveraging properties from the controller, we introduce computationally efficient data augmentation methods that enable a significant reduction of the number of MPC demonstrations and training efforts required to generate a robust policy. Our approach opens the possibility of zero-shot transfer of a policy trained from a single MPC demonstration collected in a nominal domain, such as a simulation or a robot in a lab/controlled environment, to a new domain with previously unseen bounded model errors/perturbations. Numerical evaluations performed using linear and nonlinear MPC for agile flight on a multirotor show that our method outperforms strategies commonly employed in IL (such as Dataset-Aggregation (DAgger) and Domain Randomization (DR)) in terms of demonstration-efficiency, training time, and robustness to perturbations unseen during training. Experimental evaluations validate the efficiency and real-world robustness.
Paper Structure (31 sections, 33 equations, 15 figures, 8 tables, 2 algorithms)

This paper contains 31 sections, 33 equations, 15 figures, 8 tables, 2 algorithms.

Figures (15)

  • Figure 1: Time-lapse of a multirotor performing a flip using a DNN policy learned via the proposed approach. The policy is learned offboard efficiently (in only $\boldsymbol{100}$ s of training time), and deployed onboard (NVIDIA Jetson TX2, CPU), tested at up to $\boldsymbol{500}$ Hz, with an average inference time of $\boldsymbol{15}$$\boldsymbol{\mu}$s). (a) Upwards acceleration phase (red arrow: thrust vector, yellow arrow: trajectory). (b) $360^\circ$ rotation around the body $x$-axis in $\sim 0.5$ s. (c) Deceleration phase.
  • Figure 2: Overview of the approach proposed to generate a DNN-based policy $\pi_\theta$ from a computationally expensive MPC in a data and compute-efficient way. We do so by generating a RTMPC using bounds of the disturbances encountered in the deployment domain. We use properties of the tube to derive a computationally efficient DA strategy that generates extra state-action pairs $(x^+, u^+)$, obtaining $\pi_{\hat{\theta}^*}$ via IL. Our approach enables zero-shot transfer from a single demonstration collected in simulation (sim2real) or a controlled environment (lab, factory, lab2real).
  • Figure 3: Illustration of the sequence of robust control invariant sets ${\mathbb{Z} \oplus \bar{\mathbf{x}}_0^*(\mathbf{x}_t)}$ computed by RTMPC for a system with state $\mathbf{x}_t$ and dimension $n_x = 2$.
  • Figure 4: The possible strategies to sample extra state-action pairs from an axis-aligned bounding box, approximation of robust control invariant set of the RTMPC expert: dense (left) and sparse (right). The diagram is for a system with state dimension $n_x = 3$.
  • Figure 5: Robustness (Success Rate) in the task of flying along a figure-8 trajectory ($7$ s long), with wind-like disturbances (right, target domain $\mathcal{T}_1$) and without (left, source domain $\mathcal{S}$), starting from different initial states. Evaluation across $10$ random seeds, $10$ times per demonstration per seed. Shaded lines are the $95\%$ confidence interval. The lines for the SA-based methods overlap.
  • ...and 10 more figures