Stairway to Success: An Online Floor-Aware Zero-Shot Object-Goal Navigation Framework via LLM-Driven Coarse-to-Fine Exploration
Zeying Gong, Rong Li, Tianshuai Hu, Ronghe Qiu, Lingdong Kong, Lingfeng Zhang, Guoyang Zhao, Yiyi Ding, Junwei Liang
TL;DR
ASCENT tackles the challenge of object-goal navigation across multiple floors without prior maps or task-specific training. It introduces a floor-aware online framework with a Multi-Floor Abstraction module for per-floor BEV representations and inter-floor topology, and a Coarse-to-Fine Reasoning module that dramatically reduces LLM calls while preserving planning quality. Key contributions include bidirectional stair-aware navigation, a stair-inclusive obstacle map, frontier-based exploration with semantic priors, and an LLM-driven hierarchical planner. The method achieves state-of-the-art zero-shot performance on HM3D and MP3D, demonstrates strong cross-floor generalization, and validates real-world deployment on a quadruped robot, signaling practical impact for service robots in multi-floor environments.
Abstract
Deployable service and delivery robots struggle to navigate multi-floor buildings to reach object goals, as existing systems fail due to single-floor assumptions and requirements for offline, globally consistent maps. Multi-floor environments pose unique challenges including cross-floor transitions and vertical spatial reasoning, especially navigating unknown buildings. Object-Goal Navigation benchmarks like HM3D and MP3D also capture this multi-floor reality, yet current methods lack support for online, floor-aware navigation. To bridge this gap, we propose \textbf{\textit{ASCENT}}, an online framework for Zero-Shot Object-Goal Navigation that enables robots to operate without pre-built maps or retraining on new object categories. It introduces: (1) a \textbf{Multi-Floor Abstraction} module that dynamically constructs hierarchical representations with stair-aware obstacle mapping and cross-floor topology modeling, and (2) a \textbf{Coarse-to-Fine Reasoning} module that combines frontier ranking with LLM-driven contextual analysis for multi-floor navigation decisions. We evaluate on HM3D and MP3D benchmarks, outperforming state-of-the-art zero-shot approaches, and demonstrate real-world deployment on a quadruped robot.
