Tube-NeRF: Efficient Imitation Learning of Visuomotor Policies from MPC using Tube-Guided Data Augmentation and NeRFs
Andrea Tagliabue, Jonathan P. How
TL;DR
Tube-NeRF tackles the sample-inefficiency of imitation learning for visuomotor policies by grounding demonstrations in an output-feedback RTMPC that accounts for process and sensing uncertainties. It then uses a NeRF-based data augmentation pipeline guided by the MPC tube to generate relevant synthetic views and actions, refining the policy with both synthetic and real observations. The approach yields substantial improvements in demonstration efficiency and training time, enabling real-time onboard inference (1.5 ms) for vision-based localization and trajectory tracking on a multirotor, with robust performance under disturbances. This work bridges robust model-based control, photorealistic view synthesis, and end-to-end visuomotor learning to deliver practical, robust, image-based flight policies with reduced sim-to-real gaps.
Abstract
Imitation learning (IL) can train computationally-efficient sensorimotor policies from a resource-intensive Model Predictive Controller (MPC), but it often requires many samples, leading to long training times or limited robustness. To address these issues, we combine IL with a variant of robust MPC that accounts for process and sensing uncertainties, and we design a data augmentation (DA) strategy that enables efficient learning of vision-based policies. The proposed DA method, named Tube-NeRF, leverages Neural Radiance Fields (NeRFs) to generate novel synthetic images, and uses properties of the robust MPC (the tube) to select relevant views and to efficiently compute the corresponding actions. We tailor our approach to the task of localization and trajectory tracking on a multirotor, by learning a visuomotor policy that generates control actions using images from the onboard camera as only source of horizontal position. Numerical evaluations show 80-fold increase in demonstration efficiency and a 50% reduction in training time over current IL methods. Additionally, our policies successfully transfer to a real multirotor, achieving low tracking errors despite large disturbances, with an onboard inference time of only 1.5 ms. Video: https://youtu.be/_W5z33ZK1m4
