Table of Contents
Fetching ...

Dynamic Trajectory and Power Control in Ultra-Dense UAV Networks: A Mean-Field Reinforcement Learning Approach

Fei Song, Zhe Wang, Jun Li, Long Shi, Wen Chen, Shi Jin

TL;DR

A model-free mean-field reinforcement learning algorithm named maximum entropy mean-field deep Q network (ME-MFDQN) is proposed to solve the mean-field equilibrium in both fully and partially observable scenarios.

Abstract

In ultra-dense unmanned aerial vehicle (UAV) networks, it is challenging to coordinate the resource allocation and interference management among large-scale UAVs, for providing flexible and efficient service coverage to the ground users (GUs). In this paper, we propose a learning-based resource allocation scheme in an ultra-dense UAV communication network, where the GUs' service demands are time-varying with unknown distributions. We formulate the non-cooperative game among multiple co-channel UAVs as a stochastic game, where each UAV jointly optimizes its trajectory, user association, and downlink power control to maximize the expectation of its locally cumulative energy efficiency under the interference and energy constraints. To cope with the scalability issue in a large-scale network, we further formulate the problem as a mean-field game (MFG), which simplifies the interactions among the UAVs into a two-player game between a representative UAV and a mean-field. We prove the existence and uniqueness of the equilibrium for the MFG, and propose a model-free mean-field reinforcement learning algorithm named maximum entropy mean-field deep Q network (ME-MFDQN) to solve the mean-field equilibrium in both fully and partially observable scenarios. The simulation results reveal that the proposed algorithm improves the energy efficiency compared with the benchmark algorithms. Moreover, the performance can be further enhanced if the GUs' service demands exhibit higher temporal correlation or if the UAVs have wider observation capabilities over their nearby GUs.

Dynamic Trajectory and Power Control in Ultra-Dense UAV Networks: A Mean-Field Reinforcement Learning Approach

TL;DR

A model-free mean-field reinforcement learning algorithm named maximum entropy mean-field deep Q network (ME-MFDQN) is proposed to solve the mean-field equilibrium in both fully and partially observable scenarios.

Abstract

In ultra-dense unmanned aerial vehicle (UAV) networks, it is challenging to coordinate the resource allocation and interference management among large-scale UAVs, for providing flexible and efficient service coverage to the ground users (GUs). In this paper, we propose a learning-based resource allocation scheme in an ultra-dense UAV communication network, where the GUs' service demands are time-varying with unknown distributions. We formulate the non-cooperative game among multiple co-channel UAVs as a stochastic game, where each UAV jointly optimizes its trajectory, user association, and downlink power control to maximize the expectation of its locally cumulative energy efficiency under the interference and energy constraints. To cope with the scalability issue in a large-scale network, we further formulate the problem as a mean-field game (MFG), which simplifies the interactions among the UAVs into a two-player game between a representative UAV and a mean-field. We prove the existence and uniqueness of the equilibrium for the MFG, and propose a model-free mean-field reinforcement learning algorithm named maximum entropy mean-field deep Q network (ME-MFDQN) to solve the mean-field equilibrium in both fully and partially observable scenarios. The simulation results reveal that the proposed algorithm improves the energy efficiency compared with the benchmark algorithms. Moreover, the performance can be further enhanced if the GUs' service demands exhibit higher temporal correlation or if the UAVs have wider observation capabilities over their nearby GUs.

Paper Structure

This paper contains 19 sections, 3 theorems, 46 equations, 12 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

(Existence and uniqueness of mean-field equilibrium). With any initial $\bm{\mathcal{L}} \in \mathcal{M}$, the fixed point iteration $\bm{\mathcal{L}}' = \Upsilon_2(\Upsilon_1(\bm{\mathcal{L}}), \bm{\mathcal{L}})$ converges to the unique stationary mean-field equilibrium.

Figures (12)

  • Figure 1: An ultra-dense UAV network provides communication services to GUs with time-varying service demands.
  • Figure 2: Division of a time slot.
  • Figure 3: The two-step iterative solution for MFG.
  • Figure 4: Average reward, energy efficiency and interference penalty per episode received by the representative agent under different MFRL algorithms: (a) average reward; (b) energy efficiency; (c) interference penalty.
  • Figure 5: Average reward per episode for the active UAVs.
  • ...and 7 more figures

Theorems & Definitions (7)

  • Definition 1
  • Definition 2
  • Definition 3
  • Theorem 1
  • Definition 4
  • Lemma 1
  • Lemma 2