Table of Contents
Fetching ...

Exploring the Reliability of Foundation Model-Based Frontier Selection in Zero-Shot Object Goal Navigation

Shuaihang Yuan, Halil Utku Unlu, Hao Huang, Congcong Wen, Anthony Tzes, Yi Fang

TL;DR

A novel method for reliable frontier selection in Zero-Shot Object Goal Navigation (ZS-OGN), enhancing robotic navigation systems with foundation models to improve commonsense reasoning in indoor environments and introduces a multi-expert decision framework to address the nonsensical or irrelevant reasoning often seen in foundation model-based systems.

Abstract

In this paper, we present a novel method for reliable frontier selection in Zero-Shot Object Goal Navigation (ZS-OGN), enhancing robotic navigation systems with foundation models to improve commonsense reasoning in indoor environments. Our approach introduces a multi-expert decision framework to address the nonsensical or irrelevant reasoning often seen in foundation model-based systems. The method comprises two key components: Diversified Expert Frontier Analysis (DEFA) and Consensus Decision Making (CDM). DEFA utilizes three expert models: furniture arrangement, room type analysis, and visual scene reasoning, while CDM aggregates their outputs, prioritizing unanimous or majority consensus for more reliable decisions. Demonstrating state-of-the-art performance on the RoboTHOR and HM3D datasets, our method excels at navigating towards untrained objects or goals and outperforms various baselines, showcasing its adaptability to dynamic real-world conditions and superior generalization capabilities.

Exploring the Reliability of Foundation Model-Based Frontier Selection in Zero-Shot Object Goal Navigation

TL;DR

A novel method for reliable frontier selection in Zero-Shot Object Goal Navigation (ZS-OGN), enhancing robotic navigation systems with foundation models to improve commonsense reasoning in indoor environments and introduces a multi-expert decision framework to address the nonsensical or irrelevant reasoning often seen in foundation model-based systems.

Abstract

In this paper, we present a novel method for reliable frontier selection in Zero-Shot Object Goal Navigation (ZS-OGN), enhancing robotic navigation systems with foundation models to improve commonsense reasoning in indoor environments. Our approach introduces a multi-expert decision framework to address the nonsensical or irrelevant reasoning often seen in foundation model-based systems. The method comprises two key components: Diversified Expert Frontier Analysis (DEFA) and Consensus Decision Making (CDM). DEFA utilizes three expert models: furniture arrangement, room type analysis, and visual scene reasoning, while CDM aggregates their outputs, prioritizing unanimous or majority consensus for more reliable decisions. Demonstrating state-of-the-art performance on the RoboTHOR and HM3D datasets, our method excels at navigating towards untrained objects or goals and outperforms various baselines, showcasing its adaptability to dynamic real-world conditions and superior generalization capabilities.

Paper Structure

This paper contains 21 sections, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Instances of nonsensical or irrelevant reasoning, during the frontier selection in Zero-Shot Object Goal Navigation. The green text indicates a correct understanding of the scene, while the red text refers to the reasoning that contradicts human intuition.
  • Figure 2: Workflow of the proposed ZS-OGN system, RF-NAV, for Zero-Shot Object Goal Navigation (ZS-OGN). The process begins with RGB and depth observations leading to the creation of a semantic map, which includes identified objects and room labels. This map informs the Diversified Expert Frontier Analysis (DEFA) and subsequent Consensus Decision Making (CDM) to select the most viable frontier or goal, here exemplified by the search for a 'Toilet.' The chosen goal is then fed into the Local Navigation Policy, which determines the actions necessary for the robot to explore the unknown environment.
  • Figure 3: Success rates for ZS-OGN in twelve target goal categories. The comparison is among our proposed method, our baseline method, and ESC zhou2023esc
  • Figure 4: A comparison of the generated paths to target objects between our proposed method and ESC zhou2023esc. Paths generated by our proposed method are more direct and efficient. Instances of zigzagging motion are marked in red ellipses.
  • Figure 5: The comparison of the distribution of the average number of actions to complete the zero-shot OGN across different target objects between our method and ESC. zhou2023esc