Table of Contents
Fetching ...

Graph Attention-based Reinforcement Learning for Trajectory Design and Resource Assignment in Multi-UAV Assisted Communication

Zikai Feng, Di Wu, Mengxing Huang, Chau Yuen

TL;DR

The paper tackles the challenge of designing UAV trajectories and allocating radio resources in a multi-UAV downlink under unknown environments by formulating the problem as a Markov game and addressing it with a graph-attention multi-agent trust region reinforcement learning (GA-MATR) framework. It integrates a Graph Recurrent Network to capture complex topology and a graph attention mechanism to weight inter-agent information, enabling a critic to provide more reliable feedback and supporting monotonic policy improvement toward an approximate Nash equilibrium. The authors introduce standard deviation regularization to promote fairness, perform a complexity analysis, and prove an equilibrium property, with extensive simulations showing superior convergence and cumulative rewards compared to baselines across 2–4 UAV scenarios. The work offers a scalable, data-efficient approach for real-time trajectory design and resource assignment in UAV-assisted communication, with potential impact on 6G networks and disaster-response deployments.

Abstract

In the multiple unmanned aerial vehicle (UAV)- assisted downlink communication, it is challenging for UAV base stations (UAV BSs) to realize trajectory design and resource assignment in unknown environments. The cooperation and competition between UAV BSs in the communication network leads to a Markov game problem. Multi-agent reinforcement learning is a significant solution for the above decision-making. However, there are still many common issues, such as the instability of the system and low utilization of historical data, that limit its application. In this paper, a novel graph-attention multi-agent trust region (GA-MATR) reinforcement learning framework is proposed to solve the multi-UAV assisted communication problem. Graph recurrent network is introduced to process and analyze complex topology of the communication network, so as to extract useful information and patterns from observational information. The attention mechanism provides additional weighting for conveyed information, so that the critic network can accurately evaluate the value of behavior for UAV BSs. This provides more reliable feedback signals and helps the actor network update the strategy more effectively. Ablation simulations indicate that the proposed approach attains improved convergence over the baselines. UAV BSs learn the optimal communication strategies to achieve their maximum cumulative rewards. Additionally, multi-agent trust region method with monotonic convergence provides an estimated Nash equilibrium for the multi-UAV assisted communication Markov game.

Graph Attention-based Reinforcement Learning for Trajectory Design and Resource Assignment in Multi-UAV Assisted Communication

TL;DR

The paper tackles the challenge of designing UAV trajectories and allocating radio resources in a multi-UAV downlink under unknown environments by formulating the problem as a Markov game and addressing it with a graph-attention multi-agent trust region reinforcement learning (GA-MATR) framework. It integrates a Graph Recurrent Network to capture complex topology and a graph attention mechanism to weight inter-agent information, enabling a critic to provide more reliable feedback and supporting monotonic policy improvement toward an approximate Nash equilibrium. The authors introduce standard deviation regularization to promote fairness, perform a complexity analysis, and prove an equilibrium property, with extensive simulations showing superior convergence and cumulative rewards compared to baselines across 2–4 UAV scenarios. The work offers a scalable, data-efficient approach for real-time trajectory design and resource assignment in UAV-assisted communication, with potential impact on 6G networks and disaster-response deployments.

Abstract

In the multiple unmanned aerial vehicle (UAV)- assisted downlink communication, it is challenging for UAV base stations (UAV BSs) to realize trajectory design and resource assignment in unknown environments. The cooperation and competition between UAV BSs in the communication network leads to a Markov game problem. Multi-agent reinforcement learning is a significant solution for the above decision-making. However, there are still many common issues, such as the instability of the system and low utilization of historical data, that limit its application. In this paper, a novel graph-attention multi-agent trust region (GA-MATR) reinforcement learning framework is proposed to solve the multi-UAV assisted communication problem. Graph recurrent network is introduced to process and analyze complex topology of the communication network, so as to extract useful information and patterns from observational information. The attention mechanism provides additional weighting for conveyed information, so that the critic network can accurately evaluate the value of behavior for UAV BSs. This provides more reliable feedback signals and helps the actor network update the strategy more effectively. Ablation simulations indicate that the proposed approach attains improved convergence over the baselines. UAV BSs learn the optimal communication strategies to achieve their maximum cumulative rewards. Additionally, multi-agent trust region method with monotonic convergence provides an estimated Nash equilibrium for the multi-UAV assisted communication Markov game.
Paper Structure (13 sections, 24 equations, 12 figures, 4 tables, 1 algorithm)

This paper contains 13 sections, 24 equations, 12 figures, 4 tables, 1 algorithm.

Figures (12)

  • Figure 1: Schematic diagram of UAV-assisted communication system
  • Figure 2: The time series decision-making process with three UAV BSs and nine GUs
  • Figure 3: Framework of graph attention-based multi-agent trust region reinforcement learning
  • Figure 4: Mean reward curves with two UAV BSs
  • Figure 5: Trajectories and pairing with two UAV BSs. GU 1(2,1) means that GU 1 is paired with UAV BS 2 in the start and is paired with UAV BS 1 in the end, etc.
  • ...and 7 more figures