What Matters in RL-Based Methods for Object-Goal Navigation? An Empirical Study and A Unified Framework

Hongze Wang; Boyang Sun; Jiaxu Xing; Fan Yang; Marco Hutter; Dhruv Shah; Davide Scaramuzza; Marc Pollefeys

What Matters in RL-Based Methods for Object-Goal Navigation? An Empirical Study and A Unified Framework

Hongze Wang, Boyang Sun, Jiaxu Xing, Fan Yang, Marco Hutter, Dhruv Shah, Davide Scaramuzza, Marc Pollefeys

TL;DR

This paper addresses Object-Goal Navigation by systematically dissecting modular RL pipelines into perception, policy, and test-time enhancement, and comparing their contributions under controlled experiments. It finds perception quality and test-time strategies to be the primary drivers of performance, with policy improvements offering limited gains given current training methods. The authors propose concrete design guidelines and demonstrate a modular system that sets new SotA on standard benchmarks, while revealing a substantial gap to human experts (e.g., 98% SR). This work emphasizes principled evaluation and practical deployment considerations, including dynamic evaluation and plug-in enhancements, to accelerate progress toward robust, real-world ObjectNav systems.

Abstract

Object-Goal Navigation (ObjectNav) is a critical component toward deploying mobile robots in everyday, uncontrolled environments such as homes, schools, and workplaces. In this context, a robot must locate target objects in previously unseen environments using only its onboard perception. Success requires the integration of semantic understanding, spatial reasoning, and long-horizon planning, which is a combination that remains extremely challenging. While reinforcement learning (RL) has become the dominant paradigm, progress has spanned a wide range of design choices, yet the field still lacks a unifying analysis to determine which components truly drive performance. In this work, we conduct a large-scale empirical study of modular RL-based ObjectNav systems, decomposing them into three key components: perception, policy, and test-time enhancement. Through extensive controlled experiments, we isolate the contribution of each and uncover clear trends: perception quality and test-time strategies are decisive drivers of performance, whereas policy improvements with current methods yield only marginal gains. Building on these insights, we propose practical design guidelines and demonstrate an enhanced modular system that surpasses State-of-the-Art (SotA) methods by 6.6% on SPL and by a 2.7% success rate. We also introduce a human baseline under identical conditions, where experts achieve an average 98% success, underscoring the gap between RL agents and human-level navigation. Our study not only sets the SotA performance but also provides principled guidance for future ObjectNav development and evaluation.

What Matters in RL-Based Methods for Object-Goal Navigation? An Empirical Study and A Unified Framework

TL;DR

Abstract

What Matters in RL-Based Methods for Object-Goal Navigation? An Empirical Study and A Unified Framework

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)