Table of Contents
Fetching ...

REST: Receding Horizon Explorative Steiner Tree for Zero-Shot Object-Goal Navigation

Shuqi Xiao, Maani Ghaffari, Chengzhong Xu, Hui Kong

Abstract

Zero-shot object-goal navigation (ZSON) requires navigating unknown environments to find a target object without task-specific training. Prior hierarchical training-free solutions invest in scene understanding (\textit{belief}) and high-level decision-making (\textit{policy}), yet overlook the design of \textit{option}, i.e., a subgoal candidate proposed from evolving belief and presented to policy for selection. In practice, options are reduced to isolated waypoints scored independently: single destinations hide the value gathered along the journey; an unstructured collection obscures the relationships among candidates. Our insight is that the option space should be a \textit{tree of paths}. Full paths expose en-route information gain that destination-only scoring systematically neglects; a tree of shared segments enables coarse-to-fine LLM reasoning that dismisses or pursues entire branches before examining individual leaves, compressing the combinatorial path space into an efficient hierarchy. We instantiate this insight in \textbf{REST} (Receding Horizon Explorative Steiner Tree), a training-free framework that (1) builds an explicit open-vocabulary 3D map from online RGB-D streams; (2) grows an agent-centric tree of safe and informative paths as the option space via sampling-based planning; and (3) textualizes each branch into a spatial narrative and selects the next-best path through chain-of-thought LLM reasoning. Across the Gibson, HM3D, and HSSD benchmarks, REST consistently ranks among the top methods in success rate while achieving the best or second-best path efficiency, demonstrating a favorable efficiency-success balance.

REST: Receding Horizon Explorative Steiner Tree for Zero-Shot Object-Goal Navigation

Abstract

Zero-shot object-goal navigation (ZSON) requires navigating unknown environments to find a target object without task-specific training. Prior hierarchical training-free solutions invest in scene understanding (\textit{belief}) and high-level decision-making (\textit{policy}), yet overlook the design of \textit{option}, i.e., a subgoal candidate proposed from evolving belief and presented to policy for selection. In practice, options are reduced to isolated waypoints scored independently: single destinations hide the value gathered along the journey; an unstructured collection obscures the relationships among candidates. Our insight is that the option space should be a \textit{tree of paths}. Full paths expose en-route information gain that destination-only scoring systematically neglects; a tree of shared segments enables coarse-to-fine LLM reasoning that dismisses or pursues entire branches before examining individual leaves, compressing the combinatorial path space into an efficient hierarchy. We instantiate this insight in \textbf{REST} (Receding Horizon Explorative Steiner Tree), a training-free framework that (1) builds an explicit open-vocabulary 3D map from online RGB-D streams; (2) grows an agent-centric tree of safe and informative paths as the option space via sampling-based planning; and (3) textualizes each branch into a spatial narrative and selects the next-best path through chain-of-thought LLM reasoning. Across the Gibson, HM3D, and HSSD benchmarks, REST consistently ranks among the top methods in success rate while achieving the best or second-best path efficiency, demonstrating a favorable efficiency-success balance.
Paper Structure (35 sections, 1 equation, 3 figures, 2 tables, 1 algorithm)

This paper contains 35 sections, 1 equation, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: REST reasons over an agent-centric tree of safe and informative paths rather than evaluating isolated waypoints. Here, REST selects the next-best subtree among four options via spatial narratives; a conventional agent (e.g., VLFM yokoyamaVLFMVisionLanguageFrontier2024) independently scores ten waypoints by semantic similarity and geometric proximity, discarding spatial-temporal context.
  • Figure 2: Overview of REST, a training-free ObjectNav framework that replans in a receding-horizon manner. At each decision cycle, the agent updates the from online RGB-D streams, grows an agent-centric Steiner tree of safe and informative paths as the option space, textualizes each branch into a spatial narrative, and selects the next-best path through chain-of-thought LLM reasoning.
  • Figure 3: The RT-RRT* subtree connecting current agent (indexed by 0) to all informative viewpoints indexed from 1 to 15 (left) versus the optimized Steiner tree (right). Independent per-path optimization produces redundant edges, while the OAESMT formulation merges shared segments and surfaces decision junctions, reducing total path length from $85 \mathrm{m}$ to $47 \mathrm{m}$.