Table of Contents
Fetching ...

HDPlanner: Advancing Autonomous Deployments in Unknown Environments through Hierarchical Decision Networks

Jingsong Liang, Yuhong Cao, Yixiao Ma, Hanqi Zhao, Guillaume Sartoretti

TL;DR

HDPlanner introduces a hierarchical deep reinforcement learning framework for simultaneous autonomous exploration and navigation in unknown, partially observable environments. It decomposes long-term objectives into high-level beacon selection and low-level waypoint planning, powered by a Transformer-based viewpoint encoder and cross-attention beacon/waypoint decoders, augmented with a hierarchical critic and a contrastive joint optimization to improve robustness. The method achieves real-time planning with significant efficiency gains and outperforms state-of-the-art baselines across extensive simulations, large Gazebo environments, and real-world hardware experiments in indoor and outdoor settings. These results demonstrate the practical potential of hierarchical decision networks for robust, scalable autonomous deployments in complex, uncertain terrains.

Abstract

In this paper, we introduce HDPlanner, a deep reinforcement learning (DRL) based framework designed to tackle two core and challenging tasks for mobile robots: autonomous exploration and navigation, where the robot must optimize its trajectory adaptively to achieve the task objective through continuous interactions in unknown environments. Specifically, HDPlanner relies on novel hierarchical attention networks to empower the robot to reason about its belief across multiple spatial scales and sequence collaborative decisions, where our networks decompose long-term objectives into short-term informative task assignments and informative path plannings. We further propose a contrastive learning-based joint optimization to enhance the robustness of HDPlanner. We empirically demonstrate that HDPlanner significantly outperforms state-of-the-art conventional and learning-based baselines on an extensive set of simulations, including hundreds of test maps and large-scale, complex Gazebo environments. Notably, HDPlanner achieves real-time planning with travel distances reduced by up to 35.7% compared to exploration benchmarks and by up to 16.5% than navigation benchmarks. Furthermore, we validate our approach on hardware, where it generates high-quality, adaptive trajectories in both indoor and outdoor environments, highlighting its real-world applicability without additional training.

HDPlanner: Advancing Autonomous Deployments in Unknown Environments through Hierarchical Decision Networks

TL;DR

HDPlanner introduces a hierarchical deep reinforcement learning framework for simultaneous autonomous exploration and navigation in unknown, partially observable environments. It decomposes long-term objectives into high-level beacon selection and low-level waypoint planning, powered by a Transformer-based viewpoint encoder and cross-attention beacon/waypoint decoders, augmented with a hierarchical critic and a contrastive joint optimization to improve robustness. The method achieves real-time planning with significant efficiency gains and outperforms state-of-the-art baselines across extensive simulations, large Gazebo environments, and real-world hardware experiments in indoor and outdoor settings. These results demonstrate the practical potential of hierarchical decision networks for robust, scalable autonomous deployments in complex, uncertain terrains.

Abstract

In this paper, we introduce HDPlanner, a deep reinforcement learning (DRL) based framework designed to tackle two core and challenging tasks for mobile robots: autonomous exploration and navigation, where the robot must optimize its trajectory adaptively to achieve the task objective through continuous interactions in unknown environments. Specifically, HDPlanner relies on novel hierarchical attention networks to empower the robot to reason about its belief across multiple spatial scales and sequence collaborative decisions, where our networks decompose long-term objectives into short-term informative task assignments and informative path plannings. We further propose a contrastive learning-based joint optimization to enhance the robustness of HDPlanner. We empirically demonstrate that HDPlanner significantly outperforms state-of-the-art conventional and learning-based baselines on an extensive set of simulations, including hundreds of test maps and large-scale, complex Gazebo environments. Notably, HDPlanner achieves real-time planning with travel distances reduced by up to 35.7% compared to exploration benchmarks and by up to 16.5% than navigation benchmarks. Furthermore, we validate our approach on hardware, where it generates high-quality, adaptive trajectories in both indoor and outdoor environments, highlighting its real-world applicability without additional training.
Paper Structure (22 sections, 6 equations, 5 figures, 4 tables, 2 algorithms)

This paper contains 22 sections, 6 equations, 5 figures, 4 tables, 2 algorithms.

Figures (5)

  • Figure 1: Illustration of our proposed hierarchical planning for autonomous deployments. HDPlanner breaks down long-term objectives for exploration/navigation into short-term sub-goals to efficiently accomplish the tasks: high-level assignments by choosing a beacon (white tick) which indicates informative areas, and low-level reactive planning through consecutive waypoints (yellow arrow) to collect information as approaching the designated beacon.
  • Figure 2: Illustration of Observations. In (a), the robot forms its observation at time step $t$. Specifically, the beacons $b_i^t$ are generated based on the current non-zero utility viewpoints. In (b), upon reaching the new position $p_{t+1}$, the robot's belief expands, leading to an update of the beacons to $b_i^{t+1}$ at time step $t+1$.
  • Figure 3: Our proposed hierarchical decision networks. The viewpoint encoder first integrates spatial information from the current observation to produce encoded features $\mathcal{H}_e$, where each viewpoint shares its positions along with its planning sets based on the current robot belief. After that, the encoded features of the robot's current position and beacons, $h_p$ and $\mathcal{H}_b$, are used to choose a beacon $\hat{b}$ which indicates the most informative areas, through the beacon decoder. The waypoint decoder then utilizes the encoded features of the selected beacon and the robot's neighbors, $h_{\hat{b}}$ and $\mathcal{H}_a$, to output a waypoint $\hat{a}$, which indicates a collision-free trajectory started from robot's current position, for informative path planning during the period of approaching the selected beacon.
  • Figure 4: Trajectories comparisons of exploration and navigation planners in large-scale Gazebo simulation. The colored curve represents the trajectory outputted by the ground vehicle. In (a), each planner starts its exploration from the white dot. In (b), each planner navigates through consecutive predefined targets (red dots). After reaching each target, the planner is reset to ensure the navigation task begins in a fully unknown environment.
  • Figure 5: Real-robot experiments.