Traffic Learning and Proactive UAV Trajectory Planning for Data Uplink in Markovian IoT Models

Eslam Eldeeb; Mohammad Shehab; Hirley Alves

Traffic Learning and Proactive UAV Trajectory Planning for Data Uplink in Markovian IoT Models

Eslam Eldeeb, Mohammad Shehab, Hirley Alves

TL;DR

The paper tackles minimizing data freshness (AoI) in IoT networks by proactively planning UAV trajectories based on traffic predicted from Markovian event-driven activity. It compares a classical Forward Algorithm (FA) with a deep learning LSTM predictor, then uses a Deep Q-Network to jointly optimize UAV paths and scheduling with a reward balancing AoI, regret, and energy via tunable weights $\zeta_1$ and $\zeta_2$. The contributions include a two-stage framework (traffic prediction and UAV learning), a detailed MDP formulation with deterministic transitions, and a DQN solution that, in simulations, shows the LSTM-DRL approach approaching genie-aided performance and outperforming a random-walk baseline. The results highlight the benefits of data-driven traffic forecasting for proactive UAV-based data collection, while also noting computational trade-offs and suggesting meta-learning for dynamic environments in future work.

Abstract

The age of information (AoI) is used to measure the freshness of the data. In IoT networks, the traditional resource management schemes rely on a message exchange between the devices and the base station (BS) before communication which causes high AoI, high energy consumption, and low reliability. Unmanned aerial vehicles (UAVs) as flying BSs have many advantages in minimizing the AoI, energy-saving, and throughput improvement. In this paper, we present a novel learning-based framework that estimates the traffic arrival of IoT devices based on Markovian events. The learning proceeds to optimize the trajectory of multiple UAVs and their scheduling policy. First, the BS predicts the future traffic of the devices. We compare two traffic predictors: the forward algorithm (FA) and the long short-term memory (LSTM). Afterward, we propose a deep reinforcement learning (DRL) approach to optimize the optimal policy of each UAV. Finally, we manipulate the optimum reward function for the proposed DRL approach. Simulation results show that the proposed algorithm outperforms the random-walk (RW) baseline model regarding the AoI, scheduling accuracy, and transmission power.

Traffic Learning and Proactive UAV Trajectory Planning for Data Uplink in Markovian IoT Models

TL;DR

and

. The contributions include a two-stage framework (traffic prediction and UAV learning), a detailed MDP formulation with deterministic transitions, and a DQN solution that, in simulations, shows the LSTM-DRL approach approaching genie-aided performance and outperforming a random-walk baseline. The results highlight the benefits of data-driven traffic forecasting for proactive UAV-based data collection, while also noting computational trade-offs and suggesting meta-learning for dynamic environments in future work.

Abstract

Paper Structure (34 sections, 32 equations, 7 figures, 3 tables, 3 algorithms)

This paper contains 34 sections, 32 equations, 7 figures, 3 tables, 3 algorithms.

Introduction
Related Literature
Contribution
Outline
System Model
System Analysis
Traffic Arrival
Problem formulation
Age of Information
Accumulative regret
Transmission power
Joint optimization problem
The Traffic Prediction Stage
The Forward Algorithm
Long Short-Term Memory
...and 19 more sections

Figures (7)

Figure 1: The system model: IoT devices are served by multiple UAVs that relay the information to the BS located at the center of the grid world.
Figure 2: The activation of $D$ devices is modeled as a Markovian arrival of $K$ binary events. If an event $k$ is active, it influences a device $d$ with an activation probability of $p_{dk}$.
Figure 3: The stages of the proposed algorithm.
Figure 4: Flow chart of the proposed algorithm. First, the BS estimates the traffic using the LSTM or the FA in the traffic estimation stage. Then, the optimal policy of the UAVs is optimized in the UAV learning stage.
Figure 5: (a) MSE of the FA activation probability prediction and training and validation losses of LSTM. (b) Trajectory path of 2-UAVs serving a network of $D = 10$ devices using the LSTM as the traffic predictor. The values for $\zeta_1$ and $\zeta_2$ are $25$ and $500$, respectively. The lower devices have a higher activation probability than the rest of the devices.
...and 2 more figures

Traffic Learning and Proactive UAV Trajectory Planning for Data Uplink in Markovian IoT Models

TL;DR

Abstract

Traffic Learning and Proactive UAV Trajectory Planning for Data Uplink in Markovian IoT Models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)