Table of Contents
Fetching ...

Tube-NeRF: Efficient Imitation Learning of Visuomotor Policies from MPC using Tube-Guided Data Augmentation and NeRFs

Andrea Tagliabue, Jonathan P. How

TL;DR

Tube-NeRF tackles the sample-inefficiency of imitation learning for visuomotor policies by grounding demonstrations in an output-feedback RTMPC that accounts for process and sensing uncertainties. It then uses a NeRF-based data augmentation pipeline guided by the MPC tube to generate relevant synthetic views and actions, refining the policy with both synthetic and real observations. The approach yields substantial improvements in demonstration efficiency and training time, enabling real-time onboard inference (1.5 ms) for vision-based localization and trajectory tracking on a multirotor, with robust performance under disturbances. This work bridges robust model-based control, photorealistic view synthesis, and end-to-end visuomotor learning to deliver practical, robust, image-based flight policies with reduced sim-to-real gaps.

Abstract

Imitation learning (IL) can train computationally-efficient sensorimotor policies from a resource-intensive Model Predictive Controller (MPC), but it often requires many samples, leading to long training times or limited robustness. To address these issues, we combine IL with a variant of robust MPC that accounts for process and sensing uncertainties, and we design a data augmentation (DA) strategy that enables efficient learning of vision-based policies. The proposed DA method, named Tube-NeRF, leverages Neural Radiance Fields (NeRFs) to generate novel synthetic images, and uses properties of the robust MPC (the tube) to select relevant views and to efficiently compute the corresponding actions. We tailor our approach to the task of localization and trajectory tracking on a multirotor, by learning a visuomotor policy that generates control actions using images from the onboard camera as only source of horizontal position. Numerical evaluations show 80-fold increase in demonstration efficiency and a 50% reduction in training time over current IL methods. Additionally, our policies successfully transfer to a real multirotor, achieving low tracking errors despite large disturbances, with an onboard inference time of only 1.5 ms. Video: https://youtu.be/_W5z33ZK1m4

Tube-NeRF: Efficient Imitation Learning of Visuomotor Policies from MPC using Tube-Guided Data Augmentation and NeRFs

TL;DR

Tube-NeRF tackles the sample-inefficiency of imitation learning for visuomotor policies by grounding demonstrations in an output-feedback RTMPC that accounts for process and sensing uncertainties. It then uses a NeRF-based data augmentation pipeline guided by the MPC tube to generate relevant synthetic views and actions, refining the policy with both synthetic and real observations. The approach yields substantial improvements in demonstration efficiency and training time, enabling real-time onboard inference (1.5 ms) for vision-based localization and trajectory tracking on a multirotor, with robust performance under disturbances. This work bridges robust model-based control, photorealistic view synthesis, and end-to-end visuomotor learning to deliver practical, robust, image-based flight policies with reduced sim-to-real gaps.

Abstract

Imitation learning (IL) can train computationally-efficient sensorimotor policies from a resource-intensive Model Predictive Controller (MPC), but it often requires many samples, leading to long training times or limited robustness. To address these issues, we combine IL with a variant of robust MPC that accounts for process and sensing uncertainties, and we design a data augmentation (DA) strategy that enables efficient learning of vision-based policies. The proposed DA method, named Tube-NeRF, leverages Neural Radiance Fields (NeRFs) to generate novel synthetic images, and uses properties of the robust MPC (the tube) to select relevant views and to efficiently compute the corresponding actions. We tailor our approach to the task of localization and trajectory tracking on a multirotor, by learning a visuomotor policy that generates control actions using images from the onboard camera as only source of horizontal position. Numerical evaluations show 80-fold increase in demonstration efficiency and a 50% reduction in training time over current IL methods. Additionally, our policies successfully transfer to a real multirotor, achieving low tracking errors despite large disturbances, with an onboard inference time of only 1.5 ms. Video: https://youtu.be/_W5z33ZK1m4
Paper Structure (17 sections, 15 equations, 8 figures, 4 tables)

This paper contains 17 sections, 15 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Tube-NeRF collects a real-world demonstration using output-feedback tube MPC, a robust MPC which accounts for process and sensing uncertainties through its tube cross-section $\mathbb{Z}$. Then, it generates a NeRF of the environment from the collected images $\mathcal{I}_t$, and uses the tube's cross-section to guide the selection of synthetic views $\mathcal{I}_t^+$ from the NeRF for data augmentation, while corresponding actions are obtained via the ancillary controller, an integral component of the tube MPC framework.
  • Figure 2: Output feedback RTMPC generates and tracks a safe reference to satisfy constraints.
  • Figure 3: Architecture of the employed visuomotor student policy. The policy takes as input a raw camera image, a reference trajectory $\mathbf{x}^\text{des}_{0|t}, \dots, \mathbf{x}^\text{des}_{N|t}$ and noisy measurements of the altitude $p^m_z$, velocity $\boldsymbol{v}_t$ and tilt (roll $\varphi_t$, pitch $\vartheta_t$) of the multirotor. It outputs an action $\textbf{u}_t$, representing a desired roll, pitch, and thrust set-points for the cascaded attitude controller. The policy additionally outputs an estimate of the state $\hat{\mathbf{x}}_t$, which was found useful to promote learning of features relevant to position estimation.
  • Figure 4: Episode length (timestep before a state constraint violation, up to $300$) vs. number of demonstrations collected from the expert, and vs the training time (the time required to collect such demonstrations in simulation, and to train the policy). This shows that policies trained with Tube-NeRF (TN) archive full episode length after a single demonstration, and require less than half of the training time than the best-performing baselines (DR-based methods). Note that the lines of Tube-NeRF-based approaches vs the number of demonstrations overlap. Shaded areas are $95\%$ confidence intervals. Note that to focus our study on the effects of process uncertainties and sensing noise, we do not apply visual changes to the environment, nor the robustification to visual changes (\ref{['sec:robustification_to_visual_changes']}). Evaluations across $10$ seeds, $10$ times per seed.
  • Figure 5: Qualitative evaluation in experiments, highlighting the high velocity and the challenging 3D motion that the student policy can execute under uncertainties.
  • ...and 3 more figures