Vision-Based Deep Reinforcement Learning of UAV Autonomous Navigation Using Privileged Information
Junqiao Wang, Zhongliang Yu, Dong Zhou, Jiaqi Shi, Runran Deng
TL;DR
This work addresses autonomous UAV navigation under partial observability by integrating deep reinforcement learning with privileged information. The proposed DPRL framework employs an asymmetric Actor-Critic architecture and asynchronous multi-agent exploration to utilize privileged perception data (noiseless depth, accurate localization, and an obstacle map) during training, accelerating convergence and improving robustness. In TD3-based training, DPRL achieves faster learning and higher performance than TD3 and EGO-Planner-v2, with ablations confirming the value of privileged information and a 4D action space for efficient obstacle avoidance. The results demonstrate significant potential for real-world deployment and simulation-to-real transfer, particularly in complex, unknown environments where perception is imperfect.
Abstract
The capability of UAVs for efficient autonomous navigation and obstacle avoidance in complex and unknown environments is critical for applications in agricultural irrigation, disaster relief and logistics. In this paper, we propose the DPRL (Distributed Privileged Reinforcement Learning) navigation algorithm, an end-to-end policy designed to address the challenge of high-speed autonomous UAV navigation under partially observable environmental conditions. Our approach combines deep reinforcement learning with privileged learning to overcome the impact of observation data corruption caused by partial observability. We leverage an asymmetric Actor-Critic architecture to provide the agent with privileged information during training, which enhances the model's perceptual capabilities. Additionally, we present a multi-agent exploration strategy across diverse environments to accelerate experience collection, which in turn expedites model convergence. We conducted extensive simulations across various scenarios, benchmarking our DPRL algorithm against the state-of-the-art navigation algorithms. The results consistently demonstrate the superior performance of our algorithm in terms of flight efficiency, robustness and overall success rate.
