Multi-Floor Zero-Shot Object Navigation Policy
Lingfeng Zhang, Hao Wang, Erjia Xiao, Xinyao Zhang, Qiang Zhang, Zixuan Jiang, Renjing Xu
TL;DR
This work tackles multi-floor object navigation by introducing MFNP, a framework that integrates LLM-based planning, VLM-based verification, and explicit inter-floor navigation to extend Zero-Shot ObjectNav beyond single floors. It constructs a semantic map from RGB-D data, generates candidate waypoints, and uses a multi-component policy to decide when to ascend stairs and explore other floors, guided by a time-aware score $N_{MFNP}$. The approach achieves state-of-the-art results on HM3D and MP3D, with notable gains in success rate and exploration efficiency, and is validated through ablations and a real-world Unitree robot demonstration. The results highlight the importance of vertical spatial reasoning in vision-based navigation and point to future work in dataset expansion and end-to-end training for multi-floor environments.
Abstract
Object navigation in multi-floor environments presents a formidable challenge in robotics, requiring sophisticated spatial reasoning and adaptive exploration strategies. Traditional approaches have primarily focused on single-floor scenarios, overlooking the complexities introduced by multi-floor structures. To address these challenges, we first propose a Multi-floor Navigation Policy (MFNP) and implement it in Zero-Shot object navigation tasks. Our framework comprises three key components: (i) Multi-floor Navigation Policy, which enables an agent to explore across multiple floors; (ii) Multi-modal Large Language Models (MLLMs) for reasoning in the navigation process; and (iii) Inter-Floor Navigation, ensuring efficient floor transitions. We evaluate MFNP on the Habitat-Matterport 3D (HM3D) and Matterport 3D (MP3D) datasets, both include multi-floor scenes. Our experiment results demonstrate that MFNP significantly outperforms all the existing methods in Zero-Shot object navigation, achieving higher success rates and improved exploration efficiency. Ablation studies further highlight the effectiveness of each component in addressing the unique challenges of multi-floor navigation. Meanwhile, we conducted real-world experiments to evaluate the feasibility of our policy. Upon deployment of MFNP, the Unitree quadruped robot demonstrated successful multi-floor navigation and found the target object in a completely unseen environment. By introducing MFNP, we offer a new paradigm for tackling complex, multi-floor environments in object navigation tasks, opening avenues for future research in visual-based navigation in realistic, multi-floor settings.
