NaviDiffusor: Cost-Guided Diffusion Model for Visual Navigation
Yiming Zeng, Hao Ren, Shuhang Wang, Junlong Huang, Hui Cheng
TL;DR
NaviDiffusor presents a hybrid approach that blends classical cost constraints with a conditional diffusion model trained on path-RGB pairs for visual navigation. During inference, differentiable task-level and scene-specific costs guide the diffusion sampling, enabling the generation of multimodal, constraint-satisfying paths without retraining. The method demonstrates strong zero-shot generalization across indoor/outdoor, simulated/real-world scenarios, outperforming baselines in collision avoidance and success rate, and it includes a path-selection mechanism to ensure temporal consistency. Practical deployment is supported by RGB-only sensing, monocular depth estimation for collision costs, and a plug-and-play inference pipeline that leverages diffusion priors. The work highlights a scalable route to integrate explicit geometric constraints within learning-based planning for robust robotic navigation.
Abstract
Visual navigation, a fundamental challenge in mobile robotics, demands versatile policies to handle diverse environments. Classical methods leverage geometric solutions to minimize specific costs, offering adaptability to new scenarios but are prone to system errors due to their multi-modular design and reliance on hand-crafted rules. Learning-based methods, while achieving high planning success rates, face difficulties in generalizing to unseen environments beyond the training data and often require extensive training. To address these limitations, we propose a hybrid approach that combines the strengths of learning-based methods and classical approaches for RGB-only visual navigation. Our method first trains a conditional diffusion model on diverse path-RGB observation pairs. During inference, it integrates the gradients of differentiable scene-specific and task-level costs, guiding the diffusion model to generate valid paths that meet the constraints. This approach alleviates the need for retraining, offering a plug-and-play solution. Extensive experiments in both indoor and outdoor settings, across simulated and real-world scenarios, demonstrate zero-shot transfer capability of our approach, achieving higher success rates and fewer collisions compared to baseline methods. Code will be released at https://github.com/SYSU-RoboticsLab/NaviD.
