Age of Information Minimization using Multi-agent UAVs based on AI-Enhanced Mean Field Resource Allocation
Yousef Emami, Hao Gao, Kai Li, Luis Almeida, Eduardo Tovar, Zhu Han
TL;DR
This work tackles the challenge of minimizing data freshness (AoI) in large UAV swarms by casting the joint UAV cruise control and ground-sensor data collection as a mean-field game. It introduces MF-HPPO, an AI-driven scheme that combines proximal policy optimization with a mean-field formulation and an LSTM layer to handle mixed discrete-continuous actions and time-varying network states. The method models the swarm as a mean-field interacting population, uses a hybrid policy to select trajectories and sensor transmissions, and leverages a PDHG-inspired mean-field update within a PPO framework. Empirical results show substantial AoI reductions (up to 57% vs MADQN and 45% vs random baselines) and fast convergence, underscoring the practicality of AI-enhanced mean-field resource allocation for real-time UAV-based sensing networks.
Abstract
Unmanned Aerial Vehicle (UAV) swarms play an effective role in timely data collection from ground sensors in remote and hostile areas. Optimizing the collective behavior of swarms can improve data collection performance. This paper puts forth a new mean field flight resource allocation optimization to minimize age of information (AoI) of sensory data, where balancing the trade-off between the UAVs movements and AoI is formulated as a mean field game (MFG). The MFG optimization yields an expansive solution space encompassing continuous state and action, resulting in significant computational complexity. To address practical situations, we propose, a new mean field hybrid proximal policy optimization (MF-HPPO) scheme to minimize the average AoI by optimizing the UAV's trajectories and data collection scheduling of the ground sensors given mixed continuous and discrete actions. Furthermore, a long short term memory (LSTM) is leveraged in MF-HPPO to predict the time-varying network state and stabilize the training. Numerical results demonstrate that the proposed MF-HPPO reduces the average AoI by up to 45 percent and 57 percent in the considered simulation setting, as compared to multi-agent deep Q-learning (MADQN) method and non-learning random algorithm, respectively.
