One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation
Finn Lukas Busch, Timon Homberger, Jesús Ortega-Peimbert, Quantao Yang, Olov Andersson
TL;DR
This work tackles real-time open-vocabulary multi-object navigation by building a reusable semantic belief map (OneMap) that accumulates CLIP-aligned patch features with quantified uncertainty. It combines dense, patch-level semantic extraction via the SED/CLIP pipeline with a probabilistic 2D map update (including depth-derived uncertainty and feature leakage) and a Kalman-based fusion to produce a queryable map. Navigation is driven by a frontier-based exploration strategy that uses four semantic sub-maps and a CLIP-based similarity field to select informative frontiers and clusters, with consensus-filtering from an object detector to confirm detections. The approach achieves state-of-the-art or competitive results on HM3D single- and multi-object zero-shot tasks and demonstrates real-world onboard performance on a Jetson Orin AGX, illustrating practical impact for mobile robotics requiring flexible, memory-enabled search across arbitrary objects.
Abstract
The capability to efficiently search for objects in complex environments is fundamental for many real-world robot applications. Recent advances in open-vocabulary vision models have resulted in semantically-informed object navigation methods that allow a robot to search for an arbitrary object without prior training. However, these zero-shot methods have so far treated the environment as unknown for each consecutive query. In this paper we introduce a new benchmark for zero-shot multi-object navigation, allowing the robot to leverage information gathered from previous searches to more efficiently find new objects. To address this problem we build a reusable open-vocabulary feature map tailored for real-time object search. We further propose a probabilistic-semantic map update that mitigates common sources of errors in semantic feature extraction and leverage this semantic uncertainty for informed multi-object exploration. We evaluate our method on a set of object navigation tasks in both simulation as well as with a real robot, running in real-time on a Jetson Orin AGX. We demonstrate that it outperforms existing state-of-the-art approaches both on single and multi-object navigation tasks. Additional videos, code and the multi-object navigation benchmark will be available on https://finnbsch.github.io/OneMap.
