Multi-Floor Zero-Shot Object Navigation Policy

Lingfeng Zhang; Hao Wang; Erjia Xiao; Xinyao Zhang; Qiang Zhang; Zixuan Jiang; Renjing Xu

Multi-Floor Zero-Shot Object Navigation Policy

Lingfeng Zhang, Hao Wang, Erjia Xiao, Xinyao Zhang, Qiang Zhang, Zixuan Jiang, Renjing Xu

TL;DR

This work tackles multi-floor object navigation by introducing MFNP, a framework that integrates LLM-based planning, VLM-based verification, and explicit inter-floor navigation to extend Zero-Shot ObjectNav beyond single floors. It constructs a semantic map from RGB-D data, generates candidate waypoints, and uses a multi-component policy to decide when to ascend stairs and explore other floors, guided by a time-aware score $N_{MFNP}$. The approach achieves state-of-the-art results on HM3D and MP3D, with notable gains in success rate and exploration efficiency, and is validated through ablations and a real-world Unitree robot demonstration. The results highlight the importance of vertical spatial reasoning in vision-based navigation and point to future work in dataset expansion and end-to-end training for multi-floor environments.

Abstract

Object navigation in multi-floor environments presents a formidable challenge in robotics, requiring sophisticated spatial reasoning and adaptive exploration strategies. Traditional approaches have primarily focused on single-floor scenarios, overlooking the complexities introduced by multi-floor structures. To address these challenges, we first propose a Multi-floor Navigation Policy (MFNP) and implement it in Zero-Shot object navigation tasks. Our framework comprises three key components: (i) Multi-floor Navigation Policy, which enables an agent to explore across multiple floors; (ii) Multi-modal Large Language Models (MLLMs) for reasoning in the navigation process; and (iii) Inter-Floor Navigation, ensuring efficient floor transitions. We evaluate MFNP on the Habitat-Matterport 3D (HM3D) and Matterport 3D (MP3D) datasets, both include multi-floor scenes. Our experiment results demonstrate that MFNP significantly outperforms all the existing methods in Zero-Shot object navigation, achieving higher success rates and improved exploration efficiency. Ablation studies further highlight the effectiveness of each component in addressing the unique challenges of multi-floor navigation. Meanwhile, we conducted real-world experiments to evaluate the feasibility of our policy. Upon deployment of MFNP, the Unitree quadruped robot demonstrated successful multi-floor navigation and found the target object in a completely unseen environment. By introducing MFNP, we offer a new paradigm for tackling complex, multi-floor environments in object navigation tasks, opening avenues for future research in visual-based navigation in realistic, multi-floor settings.

Multi-Floor Zero-Shot Object Navigation Policy

TL;DR

. The approach achieves state-of-the-art results on HM3D and MP3D, with notable gains in success rate and exploration efficiency, and is validated through ablations and a real-world Unitree robot demonstration. The results highlight the importance of vertical spatial reasoning in vision-based navigation and point to future work in dataset expansion and end-to-end training for multi-floor environments.

Abstract

Paper Structure (25 sections, 3 equations, 5 figures, 2 tables)

This paper contains 25 sections, 3 equations, 5 figures, 2 tables.

INTRODUCTION
RELATED WORK
Object Navigation
Large Models for Object Navigation
PRELIMINARY
Problem Formulation
Semantic Map
Candidate Waypoints Map
METHODOLOGY
Pipeline
Multi-floor Navigation Policy
LLM-based Policy
VLM-based Policy
Multi-floor Navigation Policy
Inter-floor Navigation
...and 10 more sections

Figures (5)

Figure 1: The challenge of single-floor navigation. The target object of indoor object navigation is very likely to appear on different floors of the house, so the agent may not find the target object even if it has fully explored the current floor. Our proposed stair policy introduces the concept of indoor multi-floor navigation and proposes a feasible and learning-free solution to this challenge.
Figure 2: The general pipeline of our framework. Firstly, we construct a semantic map using RGB-D observations $V_t$ and global pose $G_t$. Then we obtain various information from the semantic map and input it into policies to obtain the next waypoint. Our proposed stair policy will make the exploration decision to other floors and guide the agent throughout the process. After obtaining the next waypoint, we use the path planning policy to calculate the final action.
Figure 3: The architecture of our Multi-floor Navigation Policy. We aggregate and maintain a prompt encompassing the exploration information from each timestep to elicit recommendations from the LLM. Subsequently, we synthesize and weight all acquired information to arrive at a final determination.
Figure 4: Our policy proceeds on the Habitat platform.
Figure 5: Real-world demonstration (a Unitree quadruped robot) of MFNP

Multi-Floor Zero-Shot Object Navigation Policy

TL;DR

Abstract

Multi-Floor Zero-Shot Object Navigation Policy

Authors

TL;DR

Abstract

Table of Contents

Figures (5)