Offline Deep Model Predictive Control (MPC) for Visual Navigation
Taha Bouzid, Youssef Alj
TL;DR
This work addresses visual navigation using a single RGB camera by learning an offline deep MPC policy that follows a sequence of subgoal images. It introduces ViewNet for future-image prediction conditioned on current view and velocity, and VelocityNet to generate multi-step velocity commands under an MPC objective that minimizes image discrepancy while enforcing smooth velocities, all trained entirely offline in simulation. The approach is evaluated in a ROS Gazebo house environment, with a horizon $N=2$, image-difference threshold $e_m=17$, and velocity bounds $v_{max}=0.5$ m/s and $\omega_{max}=1.0$ rad/s, demonstrating stable tracking across linear, rotational, and combined motions. The results suggest that offline deep MPC can provide accurate and safe visual navigation suitable for embedded platforms, with future work pointing toward obstacle avoidance and real-world transfer using advanced view synthesis techniques such as NeRF.
Abstract
In this paper, we propose a new visual navigation method based on a single RGB perspective camera. Using the Visual Teach & Repeat (VT&R) methodology, the robot acquires a visual trajectory consisting of multiple subgoal images in the teaching step. In the repeat step, we propose two network architectures, namely ViewNet and VelocityNet. The combination of the two networks allows the robot to follow the visual trajectory. ViewNet is trained to generate a future image based on the current view and the velocity command. The generated future image is combined with the subgoal image for training VelocityNet. We develop an offline Model Predictive Control (MPC) policy within VelocityNet with the dual goals of (1) reducing the difference between current and subgoal images and (2) ensuring smooth trajectories by mitigating velocity discontinuities. Offline training conserves computational resources, making it a more suitable option for scenarios with limited computational capabilities, such as embedded systems. We validate our experiments in a simulation environment, demonstrating that our model can effectively minimize the metric error between real and played trajectories.
