Uncertainty-Informed Active Perception for Open Vocabulary Object Goal Navigation
Utkarsh Bajpai, Julius Rückin, Cyrill Stachniss, Marija Popović
TL;DR
This work tackles ObjectNav under semantic uncertainty by integrating a probabilistic semantic relevance model, online probabilistic geometric-semantic mapping, and an uncertainty-aware frontier planner. Semantic relevance scores are derived via a prompt ensemble from a Vision-Language Model, and per-pixel uncertainties are updated in a Bayesian-style map, enabling frontier-based planning that balances exploitation of known relevant regions with exploration of uncertain areas. The approach demonstrates competitive ObjectNav performance against state-of-the-art open-vocabulary methods without fixed hand-crafted prompts, while highlighting prompts' brittleness in baseline methods. The proposed training-free, uncertainty-informed framework advances robust open-vocabulary perception for indoor robot navigation and points to future real-world deployments.
Abstract
Mobile robots exploring indoor environments increasingly rely on vision-language models to perceive high-level semantic cues in camera images, such as object categories. Such models offer the potential to substantially advance robot behaviour for tasks such as object-goal navigation (ObjectNav), where the robot must locate objects specified in natural language by exploring the environment. Current ObjectNav methods heavily depend on prompt engineering for perception and do not address the semantic uncertainty induced by variations in prompt phrasing. Ignoring semantic uncertainty can lead to suboptimal exploration, which in turn limits performance. Hence, we propose a semantic uncertainty-informed active perception pipeline for ObjectNav in indoor environments. We introduce a novel probabilistic sensor model for quantifying semantic uncertainty in vision-language models and incorporate it into a probabilistic geometric-semantic map to enhance spatial understanding. Based on this map, we develop a frontier exploration planner with an uncertainty-informed multi-armed bandit objective to guide efficient object search. Experimental results demonstrate that our method achieves ObjectNav success rates comparable to those of state-of-the-art approaches, without requiring extensive prompt engineering.
