Where Did I Leave My Glasses? Open-Vocabulary Semantic Exploration in Real-World Semi-Static Environments
Benjamin Bogenberger, Oliver Harrison, Orrin Dahanaggamaarachchi, Lukas Brunke, Jingxing Qian, Siqi Zhou, Angela P. Schoellig
TL;DR
The paper presents an open-vocabulary semantic exploration framework for robots operating in semi-static environments, combining a probabilistic change-detection map with an open-vocabulary, language-informed exploration strategy. It maintains a dynamic scene belief consisting of an object library, a missing-object library, and a background, while computing a task-specific exploration priority map that guides object-goal navigation and map maintenance. Key contributions include a Bayesian stationarity model for object instances, expected-view and ICP-based association for robust data association, and a per-object exploration map modulated by LLM-derived relevancy and semantic priors. The approach demonstrates superior performance over state-of-the-art baselines on public datasets and real-world experiments, achieving higher success rates, better change-detection F1, and real-time mapping updates in semi-static scenes.
Abstract
Robots deployed in real-world environments, such as homes, must not only navigate safely but also understand their surroundings and adapt to changes in the environment. To perform tasks efficiently, they must build and maintain a semantic map that accurately reflects the current state of the environment. Existing research on semantic exploration largely focuses on static scenes without persistent object-level instance tracking. In this work, we propose an open-vocabulary, semantic exploration system for semi-static environments. Our system maintains a consistent map by building a probabilistic model of object instance stationarity, systematically tracking semi-static changes, and actively exploring areas that have not been visited for an extended period. In addition to active map maintenance, our approach leverages the map's semantic richness with large language model (LLM)-based reasoning for open-vocabulary object-goal navigation. This enables the robot to search more efficiently by prioritizing contextually relevant areas. We compare our approach against state-of-the-art baselines using publicly available object navigation and mapping datasets, and we further demonstrate real-world transferability in three real-world environments. Our approach outperforms the compared baselines in both success rate and search efficiency for object-navigation tasks and can more reliably handle changes in mapping semi-static environments. In real-world experiments, our system detects 95% of map changes on average, improving efficiency by more than 29% as compared to random and patrol strategies.
