Table of Contents
Fetching ...

IGV-RRT: Prior-Real-Time Observation Fusion for Active Object Search in Changing Environments

Wei Zhang, Ping Gong, Yujie Wang, Minghui Bai, Rongfeng Ye, Yinchuan Wang, Yachao Wang, Leilei Yao, Teng Chen, Chen Sun, Chaoqun Wang

Abstract

Object Goal Navigation (ObjectNav) in temporally changing indoor environments is challenging because object relocation can invalidate historical scene knowledge. To address this issue, we propose a probabilistic planning framework that combines uncertainty-aware scene priors with online target relevance estimates derived from a Vision Language Model (VLM). The framework contains a dual-layer semantic mapping module and a real-time planner. The mapping module includes an Information Gain Map (IGM) built from a 3D scene graph (3DSG) during prior exploration to model object co-occurrence relations and provide global guidance on likely target regions. It also maintains a VLM score map (VLM-SM) that fuses confidence-weighted semantic observations into the map for local validation of the current scene. Based on these two cues, we develop a planner that jointly exploits information gain and semantic evidence for online decision making. The planner biases tree expansion toward semantically salient regions with high prior likelihood and strong online relevance (IGV-RRT), while preserving kinematic feasibility through gradient-based analysis. Simulation and real-world experiments demonstrate that the proposed method effectively mitigates the impact of object rearrangement, achieving higher search efficiency and success rates than representative baselines in complex indoor environments.

IGV-RRT: Prior-Real-Time Observation Fusion for Active Object Search in Changing Environments

Abstract

Object Goal Navigation (ObjectNav) in temporally changing indoor environments is challenging because object relocation can invalidate historical scene knowledge. To address this issue, we propose a probabilistic planning framework that combines uncertainty-aware scene priors with online target relevance estimates derived from a Vision Language Model (VLM). The framework contains a dual-layer semantic mapping module and a real-time planner. The mapping module includes an Information Gain Map (IGM) built from a 3D scene graph (3DSG) during prior exploration to model object co-occurrence relations and provide global guidance on likely target regions. It also maintains a VLM score map (VLM-SM) that fuses confidence-weighted semantic observations into the map for local validation of the current scene. Based on these two cues, we develop a planner that jointly exploits information gain and semantic evidence for online decision making. The planner biases tree expansion toward semantically salient regions with high prior likelihood and strong online relevance (IGV-RRT), while preserving kinematic feasibility through gradient-based analysis. Simulation and real-world experiments demonstrate that the proposed method effectively mitigates the impact of object rearrangement, achieving higher search efficiency and success rates than representative baselines in complex indoor environments.
Paper Structure (10 sections, 14 equations, 7 figures, 2 tables)

This paper contains 10 sections, 14 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Active object search in a time-varying indoor scene. The static IGM prior guides global navigation toward a high-likelihood region for the target. Online observations are processed by BLIP-2 and fused into a VLM-SM to refine local motion toward the true target. The IGM-only endpoint indicates prior bias.
  • Figure 2: Overview of the proposed active search pipeline. The framework combines an IGM derived from the scene graph and commonsense knowledge with an incrementally updated VLM map informed by RGB-D observations and offline LLM reasoning. These two cues jointly guide the selection of local sub-goals, global targets, and the execution of the path.
  • Figure 3: Static IGM construction. The figure illustrates the construction of the IGM from a 3DSG through ConceptNet-based semantic association and GMM-based spatial propagation. Panels (a) and (b) show the same IGM at different times, emphasizing that once constructed, the map remains unchanged even when object arrangements in the scene vary over time.
  • Figure 4: VLM correction and multi-prompting. (a) illustrates the corrective role of the VLM score map under a biased prior. When the prior indicates an incorrect target region, the robot is steered toward areas with higher VLM scores that reflect a higher likelihood of target presence, leading to effective progress toward the true object location.(b) and (c) highlight the impact of the prompting strategy. With a single prompt, the score response shows weak regional contrast and provides insufficient guidance for navigation. With multi-prompt querying, the score map exhibits stronger spatial discriminability, providing clearer guidance and enabling the robot to reach the target location more effectively.
  • Figure 5: Utility-based frontier scoring with explored-region gating in IGV-RRT. The figure shows how IGV-RRT scores candidate frontiers by combining distance, IGM entropy, and VLM-SM evidence into a joint utility. An explored-region mask reduces the utility of previously observed areas to only the distance heuristic, encouraging selection of informative, unexplored frontiers. Without this gating, the planner may repeatedly choose already observed areas and miss the target. With the mask, it prefers the truly target approaching frontier A
  • ...and 2 more figures