When Learning Meets Dynamics: Distributed User Connectivity Maximization in UAV-Based Communication Networks
Bowei Li, Saugat Tripathi, Salman Hosain, Ran Zhang, Jiang, Xie, Miao Wang
TL;DR
This work addresses maximizing user connectivity in UAV-based communication networks under dynamic UAV crews and shifting user distributions by formulating a time-coupled, mixed-integer non-convex problem and solving it with distributed MARL. It introduces two algorithms, DUCM-1 and DUCM-2, that explore four information-exchange levels and heterogeneous training to balance convergence, performance, and adaptation. Results show that exchanging problem-specific information (Level 3) yields the best convergence, and DUCM-2 robustly handles arbitrary quits and joins with a single training. The proposed framework offers a scalable, distributed approach for adaptive UAV trajectory control in realistic, dynamic UCNs with potential real-world impact for on-demand connectivity provisioning.
Abstract
Distributed management over Unmanned Aerial Vehicle (UAV) based communication networks (UCNs) has attracted increasing research attention. In this work, we study a distributed user connectivity maximization problem in a UCN. The work features a horizontal study over different levels of information exchange during the distributed iteration and a consideration of dynamics in UAV set and user distribution, which are not well addressed in the existing works. Specifically, the studied problem is first formulated into a time-coupled mixed-integer non-convex optimization problem. A heuristic two-stage UAV-user association policy is proposed to faster determine the user connectivity. To tackle the NP-hard problem in scalable manner, the distributed user connectivity maximization algorithm 1 (DUCM-1) is proposed under the multi-agent deep Q learning (MA-DQL) framework. DUCM-1 emphasizes on designing different information exchange levels and evaluating how they impact the learning convergence with stationary and dynamic user distribution. To comply with the UAV dynamics, DUCM-2 algorithm is developed which is devoted to autonomously handling arbitrary quit's and join-in's of UAVs in a considered time horizon. Extensive simulations are conducted i) to conclude that exchanging state information with a deliberated task-specific reward function design yields the best convergence performance, and ii) to show the efficacy and robustness of DUCM-2 against the dynamics.
