Table of Contents
Fetching ...

Path-conditioned Reinforcement Learning-based Local Planning for Long-Range Navigation

Mateo Haro, Julia Richter, Fan Yang, Cesar Cadena, Marco Hutter

Abstract

Long-range navigation is commonly addressed through hierarchical pipelines in which a global planner generates a path, decomposed into waypoints, and followed sequentially by a local planner. These systems are sensitive to global path quality, as inaccurate remote sensing data can result in locally infeasible waypoints, which degrade local execution. At the same time, the limited global context available to the local planner hinders long-range efficiency. To address this issue, we propose a reinforcement learning-based local navigation policy that leverages path information as contextual guidance. The policy is conditioned on reference path observations and trained with a reward function mainly based on goal-reaching objectives, without any explicit path-following reward. Through this implicit conditioning, the policy learns to opportunistically exploit path information while remaining robust to misleading or degraded guidance. Experimental results show that the proposed approach significantly improves navigation efficiency when high-quality paths are available and maintains baseline-level performance when path observations are severely degraded or even non-existent. These properties make the method particularly well-suited for long-range navigation scenarios in which high-level plans are approximate and local execution must remain adaptive to uncertainty.

Path-conditioned Reinforcement Learning-based Local Planning for Long-Range Navigation

Abstract

Long-range navigation is commonly addressed through hierarchical pipelines in which a global planner generates a path, decomposed into waypoints, and followed sequentially by a local planner. These systems are sensitive to global path quality, as inaccurate remote sensing data can result in locally infeasible waypoints, which degrade local execution. At the same time, the limited global context available to the local planner hinders long-range efficiency. To address this issue, we propose a reinforcement learning-based local navigation policy that leverages path information as contextual guidance. The policy is conditioned on reference path observations and trained with a reward function mainly based on goal-reaching objectives, without any explicit path-following reward. Through this implicit conditioning, the policy learns to opportunistically exploit path information while remaining robust to misleading or degraded guidance. Experimental results show that the proposed approach significantly improves navigation efficiency when high-quality paths are available and maintains baseline-level performance when path observations are severely degraded or even non-existent. These properties make the method particularly well-suited for long-range navigation scenarios in which high-level plans are approximate and local execution must remain adaptive to uncertainty.
Paper Structure (21 sections, 16 equations, 6 figures, 3 tables)

This paper contains 21 sections, 16 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: The proposed path-aware local planner utilizes a reference path (b) to provide contextual guidance, augmenting depth camera observations (a) and proprioceptive feedback.
  • Figure 2: Overview of the proposed path-aware navigation architecture. The base navigation pipeline, including the pretrained depth image encoder, the spatially-enhanced recurrent unit, the fusion of relative goal position $p_t$ and proprioceptive observations $o_t^{prop}$ with visual features, follows yang2025spatially. We extend this framework with the proposed Path Encoding module, enabling implicit conditioning on reference path information.
  • Figure 3: Qualitative comparison of navigation trajectories in a $50\times[round-precision = 0]{50}{m}$ and $40\times[round-precision = 0]{40}{m}$ maze environments. For each policy, 100 trajectories are shown.
  • Figure 4: Comparison of training reward across different path encoding architectures, illustrating the effect of architectural choices on learning performance.
  • Figure 5: Qualitative comparison of a 100 navigation trajectories under input ablation, where the scale is in meters.
  • ...and 1 more figures