Random Network Distillation Based Deep Reinforcement Learning for AGV Path Planning
Huilin Yin, Shengkai Su, Yinjia Lin, Pengju Zhen, Karin Festl, Daniel Watzenig
TL;DR
This work addresses the challenge of AGV path planning under sparse rewards by introducing a Random Network Distillation (RND) module to provide intrinsic motivation and integrate it with Proximal Policy Optimization (PPO) in continuous-action, physics-based warehouse simulations. The proposed RND-PPO framework alternates between training a fixed-target RND model and a PPO policy, using prediction-error as intrinsic reward $r_i$ alongside extrinsic reward $r_e$ to form $r = r_e + r_i$ and guide exploration. Empirical results in simple and complex static and dynamic scenes show faster and more stable learning with RND-PPO than PPO alone, achieving higher cumulative rewards with fewer episodes. The approach holds practical value for scalable, reliable AGV deployment in dynamic warehouse environments and can be extended to other RL algorithms and more complex settings. $r_i = \|\hat f(s) - f(s)\|^2$ captures state novelty, enabling intrinsic-driven exploration alongside extrinsic feedback.$
Abstract
With the flourishing development of intelligent warehousing systems, the technology of Automated Guided Vehicle (AGV) has experienced rapid growth. Within intelligent warehousing environments, AGV is required to safely and rapidly plan an optimal path in complex and dynamic environments. Most research has studied deep reinforcement learning to address this challenge. However, in the environments with sparse extrinsic rewards, these algorithms often converge slowly, learn inefficiently or fail to reach the target. Random Network Distillation (RND), as an exploration enhancement, can effectively improve the performance of proximal policy optimization, especially enhancing the additional intrinsic rewards of the AGV agent which is in sparse reward environments. Moreover, most of the current research continues to use 2D grid mazes as experimental environments. These environments have insufficient complexity and limited action sets. To solve this limitation, we present simulation environments of AGV path planning with continuous actions and positions for AGVs, so that it can be close to realistic physical scenarios. Based on our experiments and comprehensive analysis of the proposed method, the results demonstrate that our proposed method enables AGV to more rapidly complete path planning tasks with continuous actions in our environments. A video of part of our experiments can be found at https://youtu.be/lwrY9YesGmw.
