A Survey on DRL based UAV Communications and Networking: DRL Fundamentals, Applications and Implementations
Wei Zhao, Shaoxin Cui, Wen Qiu, Zhiqiang He, Zhi Liu, Xiao Zheng, Bomin Mao, Nei Kato
TL;DR
The surveyed work addresses the challenge of optimizing UAV communications in dynamic, multi-agent wireless networks by framing key problems (power allocation, channel assignment, caching, and task offloading) as optimization models solvable with deep reinforcement learning. It systematically reviews DRL fundamentals (MDP, value-based, policy-based, actor-critic, and multi-agent variants) and maps them to UAV applications, highlighting how DRL can handle nonlinearity, uncertainty, and scale in one- and multi-hop topologies. The paper emphasizes hybrid approaches that combine traditional optimization with MADRL, analyzes system-level challenges (training efficiency, non-stationarity, data requirements), and outlines open issues and future directions for robust, real-time UAV DRL deployments. Overall, this survey provides a practical roadmap for designing DRL-based UAV networking solutions with attention to scalability, collaboration, and adaptability in evolving 5/6G environments.
Abstract
Unmanned aerial vehicles (UAVs) are playing an increasingly pivotal role in modern communication networks,offering flexibility and enhanced coverage for a variety of applica-tions. However, UAV networks pose significant challenges due to their dynamic and distributed nature, particularly when dealing with tasks such as power allocation, channel assignment, caching,and task offloading. Traditional optimization techniques often struggle to handle the complexity and unpredictability of these environments, leading to suboptimal performance. This survey provides a comprehensive examination of how deep reinforcement learning (DRL) can be applied to solve these mathematical optimization problems in UAV communications and networking.Rather than simply introducing DRL methods, the focus is on demonstrating how these methods can be utilized to solve complex mathematical models of the underlying problems. We begin by reviewing the fundamental concepts of DRL, including value-based, policy-based, and actor-critic approaches. Then,we illustrate how DRL algorithms are applied to specific UAV network tasks by discussing from problem formulations to DRL implementation. By framing UAV communication challenges as optimization problems, this survey emphasizes the practical value of DRL in dynamic and uncertain environments. We also explore the strengths of DRL in handling large-scale network scenarios and the ability to continuously adapt to changes in the environment. In addition, future research directions are outlined, highlighting the potential for DRL to further enhance UAV communications and expand its applicability to more complex,multi-agent settings.
