MacroNav: Multi-Task Context Representation Learning Enables Efficient Navigation in Unknown Environments
Kuankuan Sima, Longbin Tang, Haozhe Ma, Lin Zhao
TL;DR
MacroNav tackles autonomous navigation in unknown environments with partial observability by learning a compact, multi-scale context representation through three self-supervised tasks—Stochastic Path Masking, Field-of-View Prediction, and Masked Autoencoding—and integrating it into a context-aware RL policy that reasons over a local topological graph via cross-attention. The approach yields superior navigation performance in both simulation and real-world deployments, outperforming state-of-the-art baselines in Success Rate and path efficiency while maintaining low computational cost. The work demonstrates that navigation-specific self-supervised objectives can produce representations that directly enhance decision making, generalization, and efficiency in cluttered, unknown spaces. These findings suggest a practical path toward robust, scalable autonomous navigation with reduced reliance on hand-crafted heuristics.
Abstract
Autonomous navigation in unknown environments requires compact yet expressive spatial understanding under partial observability to support high-level decision making. Existing approaches struggle to balance rich contextual representation with navigation efficiency. We present MacroNav, a learning-based navigation framework featuring two key components: (1) a lightweight context encoder trained via multi-task self-supervised learning to capture multi-scale, navigation-centric spatial representations; and (2) a reinforcement learning policy that seamlessly integrates these representations with graph-based reasoning for efficient action selection. Extensive experiments demonstrate the context encoder's efficient and robust environmental understanding. Real-world deployments further validate MacroNav's effectiveness, yielding significant gains over state-of-the-art navigation methods in both Success Rate (SR) and Success weighted by Path Length (SPL), while maintaining low computational cost. Code will be released upon acceptance.
