Learn Once Plan Arbitrarily (LOPA): Attention-Enhanced Deep Reinforcement Learning Method for Global Path Planning
Guoming Huang, Mingxin Hou, Xiaofang Yuan, Shuqiao Huang, Yaonan Wang
TL;DR
The paper tackles the challenge of global path planning with deep reinforcement learning, where traditional DRL methods suffer from poor convergence and generalization due to the large, noisy observation space. It introduces LOPA, an attention-enhanced DRL architecture that splits the observation into dynamic global and local views and processes them through a dual-channel network to improve reasoning about terrain. The method is validated on 2.5D multi-objective global planning tasks, showing accelerated convergence, stronger generalization compared with local-view baselines, and superior planning efficiency relative to traditional planners like A*, RRT, and H3DM. These results suggest that attention-driven observation refinement and multi-view integration can broadly improve DRL-based decision making in environments with infinite or highly variable state spaces, with practical benefits for scalable autonomous planning.
Abstract
Deep reinforcement learning (DRL) methods have recently shown promise in path planning tasks. However, when dealing with global planning tasks, these methods face serious challenges such as poor convergence and generalization. To this end, we propose an attention-enhanced DRL method called LOPA (Learn Once Plan Arbitrarily) in this paper. Firstly, we analyze the reasons of these problems from the perspective of DRL's observation, revealing that the traditional design causes DRL to be interfered by irrelevant map information. Secondly, we develop the LOPA which utilizes a novel attention-enhanced mechanism to attain an improved attention capability towards the key information of the observation. Such a mechanism is realized by two steps: (1) an attention model is built to transform the DRL's observation into two dynamic views: local and global, significantly guiding the LOPA to focus on the key information on the given maps; (2) a dual-channel network is constructed to process these two views and integrate them to attain an improved reasoning capability. The LOPA is validated via multi-objective global path planning experiments. The result suggests the LOPA has improved convergence and generalization performance as well as great path planning efficiency.
