Multi-AGV Path Planning Method via Reinforcement Learning and Particle Filters
Shao Shuo
TL;DR
This paper tackles multi-AGV path planning under uncertainty by integrating Particle Filters with a Double Deep Q-Network (PF-DDQN). The core idea is to treat neural network weights as latent state and use PF to iteratively refine true weights, thereby reducing variance and accelerating convergence relative to standard DDQN and EKF-DDQN baselines. Through two simulation studies (single and multi-AGV scenarios) on a 2D grid with obstacles, PF-DDQN achieves faster learning and improved path quality, with reported gains of up to 92.62% in path planning performance and 76.88% in training time over DDQN. The work demonstrates the practical potential of combining probabilistic state estimation with deep RL for robust, real-time multi-robot planning in complex environments.
Abstract
Thanks to its robust learning and search stabilities,the reinforcement learning (RL) algorithm has garnered increasingly significant attention and been exten-sively applied in Automated Guided Vehicle (AGV) path planning. However, RL-based planning algorithms have been discovered to suffer from the substantial variance of neural networks caused by environmental instability and significant fluctua-tions in system structure. These challenges manifest in slow convergence speed and low learning efficiency. To tackle this issue, this paper presents a novel multi-AGV path planning method named Particle Filters - Double Deep Q-Network (PF-DDQN)via leveraging Particle Filters (PF) and RL algorithm. Firstly, the proposed method leverages the imprecise weight values of the network as state values to formulate thestate space equation.Subsequently, the DDQN model is optimized to acquire the optimal true weight values through the iterative fusion process of neural networksand PF in order to enhance the optimization efficiency of the proposedmethod. Lastly, the performance of the proposed method is validated by different numerical simulations. The simulation results demonstrate that the proposed methoddominates the traditional DDQN algorithm in terms of path planning superiority andtraining time indicator by 92.62% and 76.88%, respectively. Therefore, the proposedmethod could be considered as a vital alternative in the field of multi-AGV path planning.
