Table of Contents
Fetching ...

OpenIN: Open-Vocabulary Instance-Oriented Navigation in Dynamic Domestic Environments

Yujie Tang, Meiling Wang, Yinan Deng, Zibo Zheng, Jingchuan Deng, Yufeng Yue

TL;DR

OpenIN tackles the challenge of navigating to specific object instances in dynamic domestic environments by introducing a dynamic Carrier-Relationship Scene Graph (CRSG) that encodes carried-by relationships and object carriers. The navigation strategy models the search as a Markov Decision Process, selecting targets via multimodal similarities and using a large language model for commonsense-driven exploration when necessary; CRSG updates are performed continually from robot observations. The approach integrates open-vocabulary instance mapping with LLM-assisted reasoning and visual-language features, enabling robust instance discrimination and responsive adaptation as objects move or change carriers. Across both simulated and real-world tests, CRSG updates improved navigation efficiency to moved targets, and ablations confirmed the contributions of each component to performance gains.

Abstract

In daily domestic settings, frequently used objects like cups often have unfixed positions and multiple instances within the same category, and their carriers frequently change as well. As a result, it becomes challenging for a robot to efficiently navigate to a specific instance. To tackle this challenge, the robot must capture and update scene changes and plans continuously. However, current object navigation approaches primarily focus on the semantic level and lack the ability to dynamically update scene representation. In contrast, this paper captures the relationships between frequently used objects and their static carriers. It constructs an open-vocabulary Carrier-Relationship Scene Graph (CRSG) and updates the carrying status during robot navigation to reflect the dynamic changes of the scene. Based on the CRSG, we further propose an instance navigation strategy that models the navigation process as a Markov Decision Process. At each step, decisions are informed by the Large Language Model's commonsense knowledge and visual-language feature similarity. We designed a series of long-sequence navigation tasks for frequently used everyday items in the Habitat simulator. The results demonstrate that by updating the CRSG, the robot can efficiently navigate to moved targets. Additionally, we deployed our algorithm on a real robot and validated its practical effectiveness. The project page can be found here: https://OpenIN-nav.github.io.

OpenIN: Open-Vocabulary Instance-Oriented Navigation in Dynamic Domestic Environments

TL;DR

OpenIN tackles the challenge of navigating to specific object instances in dynamic domestic environments by introducing a dynamic Carrier-Relationship Scene Graph (CRSG) that encodes carried-by relationships and object carriers. The navigation strategy models the search as a Markov Decision Process, selecting targets via multimodal similarities and using a large language model for commonsense-driven exploration when necessary; CRSG updates are performed continually from robot observations. The approach integrates open-vocabulary instance mapping with LLM-assisted reasoning and visual-language features, enabling robust instance discrimination and responsive adaptation as objects move or change carriers. Across both simulated and real-world tests, CRSG updates improved navigation efficiency to moved targets, and ablations confirmed the contributions of each component to performance gains.

Abstract

In daily domestic settings, frequently used objects like cups often have unfixed positions and multiple instances within the same category, and their carriers frequently change as well. As a result, it becomes challenging for a robot to efficiently navigate to a specific instance. To tackle this challenge, the robot must capture and update scene changes and plans continuously. However, current object navigation approaches primarily focus on the semantic level and lack the ability to dynamically update scene representation. In contrast, this paper captures the relationships between frequently used objects and their static carriers. It constructs an open-vocabulary Carrier-Relationship Scene Graph (CRSG) and updates the carrying status during robot navigation to reflect the dynamic changes of the scene. Based on the CRSG, we further propose an instance navigation strategy that models the navigation process as a Markov Decision Process. At each step, decisions are informed by the Large Language Model's commonsense knowledge and visual-language feature similarity. We designed a series of long-sequence navigation tasks for frequently used everyday items in the Habitat simulator. The results demonstrate that by updating the CRSG, the robot can efficiently navigate to moved targets. Additionally, we deployed our algorithm on a real robot and validated its practical effectiveness. The project page can be found here: https://OpenIN-nav.github.io.
Paper Structure (18 sections, 14 equations, 7 figures, 4 tables)

This paper contains 18 sections, 14 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: The OpenIN framework consists of two main modules. The Scene Graph Construction module focuses on constructing the scene graph that describes the carrier-carried relationships. The Graph Updating and Navigation Strategy module is responsible for executing cognitive navigation based on user instructions, following the proposed navigation strategy, while updating the scene graph in the process.
  • Figure 2: A top-down schematic of CRSG Adaptation (\ref{['CRSG Adaptation']}).
  • Figure 3: Query Results for Some Carried Instances on the Offline Map.
  • Figure 4: SPL of different methods for long-sequence object navigation. (horizontal-axis: object number, vertical-axis: SPL)
  • Figure 5: The visualization showcases two long-sequence instance navigation results of Ours, with scene 2 on the left and scene 5 on the right. Updates to task-relevant objects in the CRSG map are highlighted in small frames: red objects within red borders indicate appearances, blue objects within blue borders denote disappearances, and green objects within green borders represent new additions to the CRSG. During the initial exploration and navigation to the first object, the robot updated most of the CRSG area, enabling efficient navigation to subsequent objects by utilizing the known target locations.
  • ...and 2 more figures