Learn Once Plan Arbitrarily (LOPA): Attention-Enhanced Deep Reinforcement Learning Method for Global Path Planning

Guoming Huang; Mingxin Hou; Xiaofang Yuan; Shuqiao Huang; Yaonan Wang

Learn Once Plan Arbitrarily (LOPA): Attention-Enhanced Deep Reinforcement Learning Method for Global Path Planning

Guoming Huang, Mingxin Hou, Xiaofang Yuan, Shuqiao Huang, Yaonan Wang

TL;DR

The paper tackles the challenge of global path planning with deep reinforcement learning, where traditional DRL methods suffer from poor convergence and generalization due to the large, noisy observation space. It introduces LOPA, an attention-enhanced DRL architecture that splits the observation into dynamic global and local views and processes them through a dual-channel network to improve reasoning about terrain. The method is validated on 2.5D multi-objective global planning tasks, showing accelerated convergence, stronger generalization compared with local-view baselines, and superior planning efficiency relative to traditional planners like A*, RRT, and H3DM. These results suggest that attention-driven observation refinement and multi-view integration can broadly improve DRL-based decision making in environments with infinite or highly variable state spaces, with practical benefits for scalable autonomous planning.

Abstract

Deep reinforcement learning (DRL) methods have recently shown promise in path planning tasks. However, when dealing with global planning tasks, these methods face serious challenges such as poor convergence and generalization. To this end, we propose an attention-enhanced DRL method called LOPA (Learn Once Plan Arbitrarily) in this paper. Firstly, we analyze the reasons of these problems from the perspective of DRL's observation, revealing that the traditional design causes DRL to be interfered by irrelevant map information. Secondly, we develop the LOPA which utilizes a novel attention-enhanced mechanism to attain an improved attention capability towards the key information of the observation. Such a mechanism is realized by two steps: (1) an attention model is built to transform the DRL's observation into two dynamic views: local and global, significantly guiding the LOPA to focus on the key information on the given maps; (2) a dual-channel network is constructed to process these two views and integrate them to attain an improved reasoning capability. The LOPA is validated via multi-objective global path planning experiments. The result suggests the LOPA has improved convergence and generalization performance as well as great path planning efficiency.

Learn Once Plan Arbitrarily (LOPA): Attention-Enhanced Deep Reinforcement Learning Method for Global Path Planning

TL;DR

Abstract

Paper Structure (35 sections, 6 equations, 12 figures, 6 tables)

This paper contains 35 sections, 6 equations, 12 figures, 6 tables.

Introduction
Motivation
Related Works
Path planning using the orientation information
Path planning using the local map and orientation information
Path planning using the entire map with a moving object
Contributions
Organization of this paper
Preliminaries
2.5D path planning environment
Problem analysis
Planning on a single map
Planning on random maps
Design of LOPA
Structure of LOPA
...and 20 more sections

Figures (12)

Figure 1: The utilization of map information in planning. (a) The given map. (b) The planning process. (c) The useful information. (d) The noise.
Figure 2: The LOPA method for 2.5D path planning. The upper row illustrates the framework of LOPA. The lower row depicts the path planning process using LOPA. Starting from the right side, LOPA selects an action, which is then executed by the environment, yielding a new state. After several iterations, the optimal path is obtained through repeated action selection and execution.
Figure 3: Operation cases of attention model. Case 1 depicts Mario situated at the outset, whereas Case 2 shows Mario's movement to a specific location on the map. These scenarios indicate that by leveraging the attention model, the LOPA can consistently concentrate on crucial regions during the planning phase, thereby attaining superior convergence and planning efficacy.
Figure 4: Performance curves of different methods during training on a 50*50 map. At the left side the convergence performance of the dueling-DQN is in chaos and unstable, whereas at the right side the LOPA shows significantly stable convergence performance.
Figure 5: Case 1 of experiment 2
...and 7 more figures

Learn Once Plan Arbitrarily (LOPA): Attention-Enhanced Deep Reinforcement Learning Method for Global Path Planning

TL;DR

Abstract

Learn Once Plan Arbitrarily (LOPA): Attention-Enhanced Deep Reinforcement Learning Method for Global Path Planning

Authors

TL;DR

Abstract

Table of Contents

Figures (12)