Table of Contents
Fetching ...

Age of Information Minimization using Multi-agent UAVs based on AI-Enhanced Mean Field Resource Allocation

Yousef Emami, Hao Gao, Kai Li, Luis Almeida, Eduardo Tovar, Zhu Han

TL;DR

This work tackles the challenge of minimizing data freshness (AoI) in large UAV swarms by casting the joint UAV cruise control and ground-sensor data collection as a mean-field game. It introduces MF-HPPO, an AI-driven scheme that combines proximal policy optimization with a mean-field formulation and an LSTM layer to handle mixed discrete-continuous actions and time-varying network states. The method models the swarm as a mean-field interacting population, uses a hybrid policy to select trajectories and sensor transmissions, and leverages a PDHG-inspired mean-field update within a PPO framework. Empirical results show substantial AoI reductions (up to 57% vs MADQN and 45% vs random baselines) and fast convergence, underscoring the practicality of AI-enhanced mean-field resource allocation for real-time UAV-based sensing networks.

Abstract

Unmanned Aerial Vehicle (UAV) swarms play an effective role in timely data collection from ground sensors in remote and hostile areas. Optimizing the collective behavior of swarms can improve data collection performance. This paper puts forth a new mean field flight resource allocation optimization to minimize age of information (AoI) of sensory data, where balancing the trade-off between the UAVs movements and AoI is formulated as a mean field game (MFG). The MFG optimization yields an expansive solution space encompassing continuous state and action, resulting in significant computational complexity. To address practical situations, we propose, a new mean field hybrid proximal policy optimization (MF-HPPO) scheme to minimize the average AoI by optimizing the UAV's trajectories and data collection scheduling of the ground sensors given mixed continuous and discrete actions. Furthermore, a long short term memory (LSTM) is leveraged in MF-HPPO to predict the time-varying network state and stabilize the training. Numerical results demonstrate that the proposed MF-HPPO reduces the average AoI by up to 45 percent and 57 percent in the considered simulation setting, as compared to multi-agent deep Q-learning (MADQN) method and non-learning random algorithm, respectively.

Age of Information Minimization using Multi-agent UAVs based on AI-Enhanced Mean Field Resource Allocation

TL;DR

This work tackles the challenge of minimizing data freshness (AoI) in large UAV swarms by casting the joint UAV cruise control and ground-sensor data collection as a mean-field game. It introduces MF-HPPO, an AI-driven scheme that combines proximal policy optimization with a mean-field formulation and an LSTM layer to handle mixed discrete-continuous actions and time-varying network states. The method models the swarm as a mean-field interacting population, uses a hybrid policy to select trajectories and sensor transmissions, and leverages a PDHG-inspired mean-field update within a PPO framework. Empirical results show substantial AoI reductions (up to 57% vs MADQN and 45% vs random baselines) and fast convergence, underscoring the practicality of AI-enhanced mean-field resource allocation for real-time UAV-based sensing networks.

Abstract

Unmanned Aerial Vehicle (UAV) swarms play an effective role in timely data collection from ground sensors in remote and hostile areas. Optimizing the collective behavior of swarms can improve data collection performance. This paper puts forth a new mean field flight resource allocation optimization to minimize age of information (AoI) of sensory data, where balancing the trade-off between the UAVs movements and AoI is formulated as a mean field game (MFG). The MFG optimization yields an expansive solution space encompassing continuous state and action, resulting in significant computational complexity. To address practical situations, we propose, a new mean field hybrid proximal policy optimization (MF-HPPO) scheme to minimize the average AoI by optimizing the UAV's trajectories and data collection scheduling of the ground sensors given mixed continuous and discrete actions. Furthermore, a long short term memory (LSTM) is leveraged in MF-HPPO to predict the time-varying network state and stabilize the training. Numerical results demonstrate that the proposed MF-HPPO reduces the average AoI by up to 45 percent and 57 percent in the considered simulation setting, as compared to multi-agent deep Q-learning (MADQN) method and non-learning random algorithm, respectively.
Paper Structure (22 sections, 27 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 22 sections, 27 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Mean field representation of UAVs-assisted sensor networks.
  • Figure 2: An overview of MF-HPPO, where each UAV is equipped with the LSTM layer to optimize discrete and continuous actions using hybrid policy.
  • Figure 3: Performance evaluation of MF-HPPO by changing the number of UAVs and ground sensors
  • Figure 4: The network cost for each episode of MF-HPPO with I=30 and benchmarks
  • Figure 5: MF-HPPO trajectory distributions for various UAV counts and ground sensor distributions.
  • ...and 1 more figures