Table of Contents
Fetching ...

OpenObject-NAV: Open-Vocabulary Object-Oriented Navigation Based on Dynamic Carrier-Relationship Scene Graph

Yujie Tang, Meiling Wang, Yinan Deng, Zibo Zheng, Jiagui Zhong, Yufeng Yue

TL;DR

This paper captures the relationships between frequently used objects and their static carriers and constructs an open-vocabulary Carrier-Relationship Scene Graph (CRSG) and proposes an instance navigation strategy that models the navigation process as a Markov Decision Process.

Abstract

In everyday life, frequently used objects like cups often have unfixed positions and multiple instances within the same category, and their carriers frequently change as well. As a result, it becomes challenging for a robot to efficiently navigate to a specific instance. To tackle this challenge, the robot must capture and update scene changes and plans continuously. However, current object navigation approaches primarily focus on semantic-level and lack the ability to dynamically update scene representation. This paper captures the relationships between frequently used objects and their static carriers. It constructs an open-vocabulary Carrier-Relationship Scene Graph (CRSG) and updates the carrying status during robot navigation to reflect the dynamic changes of the scene. Based on the CRSG, we further propose an instance navigation strategy that models the navigation process as a Markov Decision Process. At each step, decisions are informed by Large Language Model's commonsense knowledge and visual-language feature similarity. We designed a series of long-sequence navigation tasks for frequently used everyday items in the Habitat simulator. The results demonstrate that by updating the CRSG, the robot can efficiently navigate to moved targets. Additionally, we deployed our algorithm on a real robot and validated its practical effectiveness.

OpenObject-NAV: Open-Vocabulary Object-Oriented Navigation Based on Dynamic Carrier-Relationship Scene Graph

TL;DR

This paper captures the relationships between frequently used objects and their static carriers and constructs an open-vocabulary Carrier-Relationship Scene Graph (CRSG) and proposes an instance navigation strategy that models the navigation process as a Markov Decision Process.

Abstract

In everyday life, frequently used objects like cups often have unfixed positions and multiple instances within the same category, and their carriers frequently change as well. As a result, it becomes challenging for a robot to efficiently navigate to a specific instance. To tackle this challenge, the robot must capture and update scene changes and plans continuously. However, current object navigation approaches primarily focus on semantic-level and lack the ability to dynamically update scene representation. This paper captures the relationships between frequently used objects and their static carriers. It constructs an open-vocabulary Carrier-Relationship Scene Graph (CRSG) and updates the carrying status during robot navigation to reflect the dynamic changes of the scene. Based on the CRSG, we further propose an instance navigation strategy that models the navigation process as a Markov Decision Process. At each step, decisions are informed by Large Language Model's commonsense knowledge and visual-language feature similarity. We designed a series of long-sequence navigation tasks for frequently used everyday items in the Habitat simulator. The results demonstrate that by updating the CRSG, the robot can efficiently navigate to moved targets. Additionally, we deployed our algorithm on a real robot and validated its practical effectiveness.
Paper Structure (15 sections, 11 equations, 6 figures, 3 tables)

This paper contains 15 sections, 11 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: The robot executes long-sequence, multi-modal, and multi-type daily object navigation commands based on a dynamic carrier-relationship scene graph. First, it successfully navigates to the displaced red alarm clock, eliminating interference from a blue alarm clock of the same type along the way. Next, based on the user's request, it navigates to a game controller. During these tasks, the robot observes a new black cup, which is added to the scene graph. This update enables efficient point-to-point navigation for the third task.
  • Figure 2: The OpenObject-NAV system framework consists of two main modules. The Scene Graph Construction module focuses on constructing the carrier-relationship scene graph. The Graph Updating and Navigation Strategy module is responsible for executing cognitive navigation based on user instructions, following the proposed navigation strategy, while updating the scene graph in the process.
  • Figure 3: Static Object Query Experiment: Comparison of Target Object Query Results on the Offline Map.
  • Figure 4: The visualization of a long-sequence instance navigation result in scene 2 is shown, where "Point to Point" represents the shortest path navigation.
  • Figure 5: The first figure presents the SPL results in Sec. \ref{['Long-sequence']}, while the second and third figures show the results of the ablation experiments with and without CRSG updates in Sec. \ref{['Ablation']}.
  • ...and 1 more figures