MonoMPC: Monocular Vision Based Navigation with Learned Collision Model and Risk-Aware Model Predictive Control
Basant Sharma, Prajyot Jadhav, Pranjal Paul, K. Madhava Krishna, Arun Kumar Singh
TL;DR
This work tackles monocular navigation in clutter by moving beyond noisy depth-based collision checks to a depth-conditioned probabilistic collision model that predicts a distribution over obstacle clearance for a given trajectory. The model feeds a risk-aware Model Predictive Control (MPC) framework, with a novel risk metric based on Maximum Mean Discrepancy (MMD) that compares the predicted clearance distribution to a feasible boundary via a chance-constraint formulation. A task-aware training pipeline jointly optimizes the collision model and risk estimator using safe and unsafe trajectories to calibrate uncertainty, resulting in better-calibrated predictions and safer, faster navigation in real-world clutter. The approach demonstrates real-time performance, robust improvements over ROSNAV, MonoNav, and NoMaD, and strong potential for deployment on mobile platforms; future work includes temporal memory and dynamic environments.
Abstract
Navigating unknown environments with a single RGB camera is challenging, as the lack of depth information prevents reliable collision-checking. While some methods use estimated depth to build collision maps, we found that depth estimates from vision foundation models are too noisy for zero-shot navigation in cluttered environments. We propose an alternative approach: instead of using noisy estimated depth for direct collision-checking, we use it as a rich context input to a learned collision model. This model predicts the distribution of minimum obstacle clearance that the robot can expect for a given control sequence. At inference, these predictions inform a risk-aware MPC planner that minimizes estimated collision risk. We proposed a joint learning pipeline that co-trains the collision model and risk metric using both safe and unsafe trajectories. Crucially, our joint-training ensures well calibrated uncertainty in our collision model that improves navigation in highly cluttered environments. Consequently, real-world experiments show reductions in collision-rate and improvements in goal reaching and speed over several strong baselines.
