HDPlanner: Advancing Autonomous Deployments in Unknown Environments through Hierarchical Decision Networks
Jingsong Liang, Yuhong Cao, Yixiao Ma, Hanqi Zhao, Guillaume Sartoretti
TL;DR
HDPlanner introduces a hierarchical deep reinforcement learning framework for simultaneous autonomous exploration and navigation in unknown, partially observable environments. It decomposes long-term objectives into high-level beacon selection and low-level waypoint planning, powered by a Transformer-based viewpoint encoder and cross-attention beacon/waypoint decoders, augmented with a hierarchical critic and a contrastive joint optimization to improve robustness. The method achieves real-time planning with significant efficiency gains and outperforms state-of-the-art baselines across extensive simulations, large Gazebo environments, and real-world hardware experiments in indoor and outdoor settings. These results demonstrate the practical potential of hierarchical decision networks for robust, scalable autonomous deployments in complex, uncertain terrains.
Abstract
In this paper, we introduce HDPlanner, a deep reinforcement learning (DRL) based framework designed to tackle two core and challenging tasks for mobile robots: autonomous exploration and navigation, where the robot must optimize its trajectory adaptively to achieve the task objective through continuous interactions in unknown environments. Specifically, HDPlanner relies on novel hierarchical attention networks to empower the robot to reason about its belief across multiple spatial scales and sequence collaborative decisions, where our networks decompose long-term objectives into short-term informative task assignments and informative path plannings. We further propose a contrastive learning-based joint optimization to enhance the robustness of HDPlanner. We empirically demonstrate that HDPlanner significantly outperforms state-of-the-art conventional and learning-based baselines on an extensive set of simulations, including hundreds of test maps and large-scale, complex Gazebo environments. Notably, HDPlanner achieves real-time planning with travel distances reduced by up to 35.7% compared to exploration benchmarks and by up to 16.5% than navigation benchmarks. Furthermore, we validate our approach on hardware, where it generates high-quality, adaptive trajectories in both indoor and outdoor environments, highlighting its real-world applicability without additional training.
