Table of Contents
Fetching ...

MacroNav: Multi-Task Context Representation Learning Enables Efficient Navigation in Unknown Environments

Kuankuan Sima, Longbin Tang, Haozhe Ma, Lin Zhao

TL;DR

MacroNav tackles autonomous navigation in unknown environments with partial observability by learning a compact, multi-scale context representation through three self-supervised tasks—Stochastic Path Masking, Field-of-View Prediction, and Masked Autoencoding—and integrating it into a context-aware RL policy that reasons over a local topological graph via cross-attention. The approach yields superior navigation performance in both simulation and real-world deployments, outperforming state-of-the-art baselines in Success Rate and path efficiency while maintaining low computational cost. The work demonstrates that navigation-specific self-supervised objectives can produce representations that directly enhance decision making, generalization, and efficiency in cluttered, unknown spaces. These findings suggest a practical path toward robust, scalable autonomous navigation with reduced reliance on hand-crafted heuristics.

Abstract

Autonomous navigation in unknown environments requires compact yet expressive spatial understanding under partial observability to support high-level decision making. Existing approaches struggle to balance rich contextual representation with navigation efficiency. We present MacroNav, a learning-based navigation framework featuring two key components: (1) a lightweight context encoder trained via multi-task self-supervised learning to capture multi-scale, navigation-centric spatial representations; and (2) a reinforcement learning policy that seamlessly integrates these representations with graph-based reasoning for efficient action selection. Extensive experiments demonstrate the context encoder's efficient and robust environmental understanding. Real-world deployments further validate MacroNav's effectiveness, yielding significant gains over state-of-the-art navigation methods in both Success Rate (SR) and Success weighted by Path Length (SPL), while maintaining low computational cost. Code will be released upon acceptance.

MacroNav: Multi-Task Context Representation Learning Enables Efficient Navigation in Unknown Environments

TL;DR

MacroNav tackles autonomous navigation in unknown environments with partial observability by learning a compact, multi-scale context representation through three self-supervised tasks—Stochastic Path Masking, Field-of-View Prediction, and Masked Autoencoding—and integrating it into a context-aware RL policy that reasons over a local topological graph via cross-attention. The approach yields superior navigation performance in both simulation and real-world deployments, outperforming state-of-the-art baselines in Success Rate and path efficiency while maintaining low computational cost. The work demonstrates that navigation-specific self-supervised objectives can produce representations that directly enhance decision making, generalization, and efficiency in cluttered, unknown spaces. These findings suggest a practical path toward robust, scalable autonomous navigation with reduced reliance on hand-crafted heuristics.

Abstract

Autonomous navigation in unknown environments requires compact yet expressive spatial understanding under partial observability to support high-level decision making. Existing approaches struggle to balance rich contextual representation with navigation efficiency. We present MacroNav, a learning-based navigation framework featuring two key components: (1) a lightweight context encoder trained via multi-task self-supervised learning to capture multi-scale, navigation-centric spatial representations; and (2) a reinforcement learning policy that seamlessly integrates these representations with graph-based reasoning for efficient action selection. Extensive experiments demonstrate the context encoder's efficient and robust environmental understanding. Real-world deployments further validate MacroNav's effectiveness, yielding significant gains over state-of-the-art navigation methods in both Success Rate (SR) and Success weighted by Path Length (SPL), while maintaining low computational cost. Code will be released upon acceptance.

Paper Structure

This paper contains 23 sections, 14 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Effective contextual representation facilitates cognition and reasoning for navigation. We propose a context encoder trained via multi-task self-supervised learning operating at different spatial scales. An RL-based navigation policy seamlessly integrates the learned contextual representations to select the optimal waypoint.
  • Figure 2: Overall architecture of MacroNav. (a) The context map is tokenized and processed by the pre-trained context encoder to extract spatial representations. (b) Navigable nodes are encoded and fused with contextual features through cross-attention, followed by pointer attention to select the action node. (c) All encoders are based on the multi-layer multi-head attention mechanisms.
  • Figure 3: Overview of the multi-task self-supervised learning method comprising three complementary tasks operating at different spatial scales.
  • Figure 4: Dataset composition for training the context encoder.
  • Figure 5: Trajectories of navigation policy with different context encoders in the unseen test environments.
  • ...and 5 more figures