CompassNav: Steering From Path Imitation To Decision Understanding In Navigation
LinFeng Li, Jian Zhao, Yuan Xie, Xin Tan, Xuelong Li
TL;DR
CompassNav reframes embodied navigation from pure path imitation to decision understanding by pairing a dense, action-level Compass-Data-22k dataset with a Gap-Aware Hybrid Reward within a two-stage SFT-then-RFT training regime. The approach yields an internal compass capable of evaluating all candidate moves, enabling superior generalization and robust sim-to-real performance on a $7$B LVLM base. Empirical results show state-of-the-art results on ObjectNav benchmarks and robust real-world deployment on a mobile robot, outperforming larger proprietary models with markedly lower data requirements. The work highlights the value of offline, dense supervision and adaptive reward design for efficient, decision-focused embodied agents and opens avenues for integrating external memory systems without sacrificing policy robustness.
Abstract
The dominant paradigm for training Large Vision-Language Models (LVLMs) in navigation relies on imitating expert trajectories. This approach reduces the complex navigation task to a sequence-to-sequence replication of a single correct path, fundamentally limiting the agent's ability to explore and generalize. In this work, we argue for and introduce a new paradigm: a shift from Path Imitation to Decision Understanding. The goal of this paradigm is to build agents that do not just follow, but truly understand how to navigate. We materialize this through two core contributions: first, we introduce Compass-Data-22k, a novel 22k-trajectory dataset.Its Reinforcement Fine-Tuning (RFT) subset provides a panoramic view of the decision landscape by annotating all feasible actions with A* geodesic distances. Second, we design a novel gap-aware hybrid reward function that dynamically adapts its feedback to decision certainty, shifting between decisive signals for optimal actions and nuanced scores to encourage exploration. Integrated into an SFT-then-RFT recipe, our CompassNav agent is trained not to memorize static routes, but to develop an internal ``compass'' that constantly intuits the direction to the goal by evaluating the relative quality of all possible moves. This approach enables our 7B agent to set a new state-of-the-art on Goal navigation benchmarks, outperforming even larger proprietary models, and achieve robust real-world goal navigation on a physical robot.
