PanoNav: Mapless Zero-Shot Object Navigation with Panoramic Scene Parsing and Dynamic Memory

Qunchao Jin; Yilin Wu; Changhao Chen

PanoNav: Mapless Zero-Shot Object Navigation with Panoramic Scene Parsing and Dynamic Memory

Qunchao Jin, Yilin Wu, Changhao Chen

TL;DR

PanoNav tackles RGB-only mapless zero-shot object navigation by introducing Panoramic Scene Parsing to extract fine-grained local and global spatial cues from six directional RGB views, paired with dot matrix inputs. It further introduces a Dynamic Bounded Memory Queue to incorporate exploration history, guiding the LLM-based decision-maker to avoid local deadlocks and improve exploration efficiency. On the HM3D benchmark, PanoNav surpasses state-of-the-art baselines in SR and SPL under mapless, open-vocabulary settings, validating both the perceptual parsing and memory-guided decision components. The approach demonstrates that rich panoramic parsing combined with historical context enables robust, open-vocabulary navigation using only RGB inputs, with practical implications for robust, hardware-efficient household robots.

Abstract

Zero-shot object navigation (ZSON) in unseen environments remains a challenging problem for household robots, requiring strong perceptual understanding and decision-making capabilities. While recent methods leverage metric maps and Large Language Models (LLMs), they often depend on depth sensors or prebuilt maps, limiting the spatial reasoning ability of Multimodal Large Language Models (MLLMs). Mapless ZSON approaches have emerged to address this, but they typically make short-sighted decisions, leading to local deadlocks due to a lack of historical context. We propose PanoNav, a fully RGB-only, mapless ZSON framework that integrates a Panoramic Scene Parsing module to unlock the spatial parsing potential of MLLMs from panoramic RGB inputs, and a Memory-guided Decision-Making mechanism enhanced by a Dynamic Bounded Memory Queue to incorporate exploration history and avoid local deadlocks. Experiments on the public navigation benchmark show that PanoNav significantly outperforms representative baselines in both SR and SPL metrics.

PanoNav: Mapless Zero-Shot Object Navigation with Panoramic Scene Parsing and Dynamic Memory

TL;DR

Abstract

PanoNav: Mapless Zero-Shot Object Navigation with Panoramic Scene Parsing and Dynamic Memory

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)