Table of Contents
Fetching ...

Where Did I Leave My Glasses? Open-Vocabulary Semantic Exploration in Real-World Semi-Static Environments

Benjamin Bogenberger, Oliver Harrison, Orrin Dahanaggamaarachchi, Lukas Brunke, Jingxing Qian, Siqi Zhou, Angela P. Schoellig

TL;DR

The paper presents an open-vocabulary semantic exploration framework for robots operating in semi-static environments, combining a probabilistic change-detection map with an open-vocabulary, language-informed exploration strategy. It maintains a dynamic scene belief consisting of an object library, a missing-object library, and a background, while computing a task-specific exploration priority map that guides object-goal navigation and map maintenance. Key contributions include a Bayesian stationarity model for object instances, expected-view and ICP-based association for robust data association, and a per-object exploration map modulated by LLM-derived relevancy and semantic priors. The approach demonstrates superior performance over state-of-the-art baselines on public datasets and real-world experiments, achieving higher success rates, better change-detection F1, and real-time mapping updates in semi-static scenes.

Abstract

Robots deployed in real-world environments, such as homes, must not only navigate safely but also understand their surroundings and adapt to changes in the environment. To perform tasks efficiently, they must build and maintain a semantic map that accurately reflects the current state of the environment. Existing research on semantic exploration largely focuses on static scenes without persistent object-level instance tracking. In this work, we propose an open-vocabulary, semantic exploration system for semi-static environments. Our system maintains a consistent map by building a probabilistic model of object instance stationarity, systematically tracking semi-static changes, and actively exploring areas that have not been visited for an extended period. In addition to active map maintenance, our approach leverages the map's semantic richness with large language model (LLM)-based reasoning for open-vocabulary object-goal navigation. This enables the robot to search more efficiently by prioritizing contextually relevant areas. We compare our approach against state-of-the-art baselines using publicly available object navigation and mapping datasets, and we further demonstrate real-world transferability in three real-world environments. Our approach outperforms the compared baselines in both success rate and search efficiency for object-navigation tasks and can more reliably handle changes in mapping semi-static environments. In real-world experiments, our system detects 95% of map changes on average, improving efficiency by more than 29% as compared to random and patrol strategies.

Where Did I Leave My Glasses? Open-Vocabulary Semantic Exploration in Real-World Semi-Static Environments

TL;DR

The paper presents an open-vocabulary semantic exploration framework for robots operating in semi-static environments, combining a probabilistic change-detection map with an open-vocabulary, language-informed exploration strategy. It maintains a dynamic scene belief consisting of an object library, a missing-object library, and a background, while computing a task-specific exploration priority map that guides object-goal navigation and map maintenance. Key contributions include a Bayesian stationarity model for object instances, expected-view and ICP-based association for robust data association, and a per-object exploration map modulated by LLM-derived relevancy and semantic priors. The approach demonstrates superior performance over state-of-the-art baselines on public datasets and real-world experiments, achieving higher success rates, better change-detection F1, and real-time mapping updates in semi-static scenes.

Abstract

Robots deployed in real-world environments, such as homes, must not only navigate safely but also understand their surroundings and adapt to changes in the environment. To perform tasks efficiently, they must build and maintain a semantic map that accurately reflects the current state of the environment. Existing research on semantic exploration largely focuses on static scenes without persistent object-level instance tracking. In this work, we propose an open-vocabulary, semantic exploration system for semi-static environments. Our system maintains a consistent map by building a probabilistic model of object instance stationarity, systematically tracking semi-static changes, and actively exploring areas that have not been visited for an extended period. In addition to active map maintenance, our approach leverages the map's semantic richness with large language model (LLM)-based reasoning for open-vocabulary object-goal navigation. This enables the robot to search more efficiently by prioritizing contextually relevant areas. We compare our approach against state-of-the-art baselines using publicly available object navigation and mapping datasets, and we further demonstrate real-world transferability in three real-world environments. Our approach outperforms the compared baselines in both success rate and search efficiency for object-navigation tasks and can more reliably handle changes in mapping semi-static environments. In real-world experiments, our system detects 95% of map changes on average, improving efficiency by more than 29% as compared to random and patrol strategies.

Paper Structure

This paper contains 26 sections, 1 equation, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: An illustration of our proposed open-vocabulary semantic exploration approach for semi-static environments, where objects can be shifted, removed, or reintroduced. To account for such changes, the system explicitly maintains a stationarity score for each object instance and actively revisits regions of the map that are likely outdated. This enables the construction of an up-to-date metric-semantic map, which we use to prioritize contextually relevant areas during (unseen) object-goal navigation in semi-static scenes. An overview of our work, including real-world experiments, can be found on our website https://utiasdsl.github.io/semi-static-semantic-exploration/ and in our video http://tiny.cc/sem-explor-semi-static.
  • Figure 2: Overview of our proposed system. We extract object candidates from the current pose and RGB-D frame (green). These are associated with objects in the semantic map, which is updated based on a probabilistic consistency estimate (red). Based on the scene belief, we build a semantic exploration priority map, indicating which map regions are relevant to current tasks -- maintaining an up-to-date map or object-goal navigation (orange). Finally, the robot leverages the priority map to select and navigate to sampled positions (blue).
  • Figure 3: Different decaying of the stationarity score $\mathbb{E}[v_i]$ for an object with $_i = \texttt{dynamic}$ (Chair) and an object with $_i = \texttt{static}$ (Coffee table). The objects are observed only at the times their stationary score increases; otherwise, they are not in the robot's view.
  • Figure 4: Illustration of how we sample from the exploration priority map. Last reached target waypoint $\mathbf{w}^*_{0}$ and iteratively sampled candidate waypoints $\mathbf{w}_{[1,3]|0}$ ($M=3$) (left). The closest candidate becomes the next target waypoint $\mathbf{w}^*_{1}$ (middle). Trajectory $\mathbf{w}^*_{[0,750]}$ produced after applying this sampling strategy for $750$ steps (right).
  • Figure 5: Example of our method (left) and DynaMem liu2024dynamem (right) searching for an unseen knife (top) and a moved chair (bottom). Our method checks the dining table, then kitchen and bedroom, while DynaMem explores randomly. Robot path and start shown in gray. Goal object marked with a yellow star; prior location with a blue cross.
  • ...and 5 more figures