Table of Contents
Fetching ...

Stairway to Success: An Online Floor-Aware Zero-Shot Object-Goal Navigation Framework via LLM-Driven Coarse-to-Fine Exploration

Zeying Gong, Rong Li, Tianshuai Hu, Ronghe Qiu, Lingdong Kong, Lingfeng Zhang, Guoyang Zhao, Yiyi Ding, Junwei Liang

TL;DR

ASCENT tackles the challenge of object-goal navigation across multiple floors without prior maps or task-specific training. It introduces a floor-aware online framework with a Multi-Floor Abstraction module for per-floor BEV representations and inter-floor topology, and a Coarse-to-Fine Reasoning module that dramatically reduces LLM calls while preserving planning quality. Key contributions include bidirectional stair-aware navigation, a stair-inclusive obstacle map, frontier-based exploration with semantic priors, and an LLM-driven hierarchical planner. The method achieves state-of-the-art zero-shot performance on HM3D and MP3D, demonstrates strong cross-floor generalization, and validates real-world deployment on a quadruped robot, signaling practical impact for service robots in multi-floor environments.

Abstract

Deployable service and delivery robots struggle to navigate multi-floor buildings to reach object goals, as existing systems fail due to single-floor assumptions and requirements for offline, globally consistent maps. Multi-floor environments pose unique challenges including cross-floor transitions and vertical spatial reasoning, especially navigating unknown buildings. Object-Goal Navigation benchmarks like HM3D and MP3D also capture this multi-floor reality, yet current methods lack support for online, floor-aware navigation. To bridge this gap, we propose \textbf{\textit{ASCENT}}, an online framework for Zero-Shot Object-Goal Navigation that enables robots to operate without pre-built maps or retraining on new object categories. It introduces: (1) a \textbf{Multi-Floor Abstraction} module that dynamically constructs hierarchical representations with stair-aware obstacle mapping and cross-floor topology modeling, and (2) a \textbf{Coarse-to-Fine Reasoning} module that combines frontier ranking with LLM-driven contextual analysis for multi-floor navigation decisions. We evaluate on HM3D and MP3D benchmarks, outperforming state-of-the-art zero-shot approaches, and demonstrate real-world deployment on a quadruped robot.

Stairway to Success: An Online Floor-Aware Zero-Shot Object-Goal Navigation Framework via LLM-Driven Coarse-to-Fine Exploration

TL;DR

ASCENT tackles the challenge of object-goal navigation across multiple floors without prior maps or task-specific training. It introduces a floor-aware online framework with a Multi-Floor Abstraction module for per-floor BEV representations and inter-floor topology, and a Coarse-to-Fine Reasoning module that dramatically reduces LLM calls while preserving planning quality. Key contributions include bidirectional stair-aware navigation, a stair-inclusive obstacle map, frontier-based exploration with semantic priors, and an LLM-driven hierarchical planner. The method achieves state-of-the-art zero-shot performance on HM3D and MP3D, demonstrates strong cross-floor generalization, and validates real-world deployment on a quadruped robot, signaling practical impact for service robots in multi-floor environments.

Abstract

Deployable service and delivery robots struggle to navigate multi-floor buildings to reach object goals, as existing systems fail due to single-floor assumptions and requirements for offline, globally consistent maps. Multi-floor environments pose unique challenges including cross-floor transitions and vertical spatial reasoning, especially navigating unknown buildings. Object-Goal Navigation benchmarks like HM3D and MP3D also capture this multi-floor reality, yet current methods lack support for online, floor-aware navigation. To bridge this gap, we propose \textbf{\textit{ASCENT}}, an online framework for Zero-Shot Object-Goal Navigation that enables robots to operate without pre-built maps or retraining on new object categories. It introduces: (1) a \textbf{Multi-Floor Abstraction} module that dynamically constructs hierarchical representations with stair-aware obstacle mapping and cross-floor topology modeling, and (2) a \textbf{Coarse-to-Fine Reasoning} module that combines frontier ranking with LLM-driven contextual analysis for multi-floor navigation decisions. We evaluate on HM3D and MP3D benchmarks, outperforming state-of-the-art zero-shot approaches, and demonstrate real-world deployment on a quadruped robot.

Paper Structure

This paper contains 34 sections, 6 equations, 15 figures, 11 tables.

Figures (15)

  • Figure 1: Motivation of ASCENT. Unlike prior approaches that fail in multi-floor scenarios, our method enables online multi-floor navigation. By reasoning across floors, our policy succeeds in locating the goal and demonstrates a meaningful step forward in ZS-OGN.
  • Figure 2: Framework overview of ASCENT. The system takes RGB-D inputs (top-left), and outputs navigation actions (bottom-right). The Multi-Floor Abstraction module (top) builds intra-floor BEV maps and models inter-floor connectivity. The Coarse-to-Fine Reasoning module (bottom) uses the LLM for contextual reasoning across floors. Therefore, ASCENT achieves floor-aware, Zero-Shot Object-Goal Navigation.
  • Figure 3: Stair Detection process. ASCENT detects upward stairs (top) and infers downward stairs using depth-based analysis (bottom).
  • Figure 4: Illustration of Fine-Grained Decision. Following a coarse-grained assessment, the robot feeds cached contextual information and learned object priors to the LLM, which then decides whether to perform inter-floor transition or intra-floor navigation.
  • Figure 5: Multi-floor scenario statistics in OGN benchmarks. Across HM3D and MP3D, over half of scenarios involve multiple floors, with approximately one-fifth requiring cross-floor navigation.
  • ...and 10 more figures