Table of Contents
Fetching ...

Traffic Learning and Proactive UAV Trajectory Planning for Data Uplink in Markovian IoT Models

Eslam Eldeeb, Mohammad Shehab, Hirley Alves

TL;DR

The paper tackles minimizing data freshness (AoI) in IoT networks by proactively planning UAV trajectories based on traffic predicted from Markovian event-driven activity. It compares a classical Forward Algorithm (FA) with a deep learning LSTM predictor, then uses a Deep Q-Network to jointly optimize UAV paths and scheduling with a reward balancing AoI, regret, and energy via tunable weights $\zeta_1$ and $\zeta_2$. The contributions include a two-stage framework (traffic prediction and UAV learning), a detailed MDP formulation with deterministic transitions, and a DQN solution that, in simulations, shows the LSTM-DRL approach approaching genie-aided performance and outperforming a random-walk baseline. The results highlight the benefits of data-driven traffic forecasting for proactive UAV-based data collection, while also noting computational trade-offs and suggesting meta-learning for dynamic environments in future work.

Abstract

The age of information (AoI) is used to measure the freshness of the data. In IoT networks, the traditional resource management schemes rely on a message exchange between the devices and the base station (BS) before communication which causes high AoI, high energy consumption, and low reliability. Unmanned aerial vehicles (UAVs) as flying BSs have many advantages in minimizing the AoI, energy-saving, and throughput improvement. In this paper, we present a novel learning-based framework that estimates the traffic arrival of IoT devices based on Markovian events. The learning proceeds to optimize the trajectory of multiple UAVs and their scheduling policy. First, the BS predicts the future traffic of the devices. We compare two traffic predictors: the forward algorithm (FA) and the long short-term memory (LSTM). Afterward, we propose a deep reinforcement learning (DRL) approach to optimize the optimal policy of each UAV. Finally, we manipulate the optimum reward function for the proposed DRL approach. Simulation results show that the proposed algorithm outperforms the random-walk (RW) baseline model regarding the AoI, scheduling accuracy, and transmission power.

Traffic Learning and Proactive UAV Trajectory Planning for Data Uplink in Markovian IoT Models

TL;DR

The paper tackles minimizing data freshness (AoI) in IoT networks by proactively planning UAV trajectories based on traffic predicted from Markovian event-driven activity. It compares a classical Forward Algorithm (FA) with a deep learning LSTM predictor, then uses a Deep Q-Network to jointly optimize UAV paths and scheduling with a reward balancing AoI, regret, and energy via tunable weights and . The contributions include a two-stage framework (traffic prediction and UAV learning), a detailed MDP formulation with deterministic transitions, and a DQN solution that, in simulations, shows the LSTM-DRL approach approaching genie-aided performance and outperforming a random-walk baseline. The results highlight the benefits of data-driven traffic forecasting for proactive UAV-based data collection, while also noting computational trade-offs and suggesting meta-learning for dynamic environments in future work.

Abstract

The age of information (AoI) is used to measure the freshness of the data. In IoT networks, the traditional resource management schemes rely on a message exchange between the devices and the base station (BS) before communication which causes high AoI, high energy consumption, and low reliability. Unmanned aerial vehicles (UAVs) as flying BSs have many advantages in minimizing the AoI, energy-saving, and throughput improvement. In this paper, we present a novel learning-based framework that estimates the traffic arrival of IoT devices based on Markovian events. The learning proceeds to optimize the trajectory of multiple UAVs and their scheduling policy. First, the BS predicts the future traffic of the devices. We compare two traffic predictors: the forward algorithm (FA) and the long short-term memory (LSTM). Afterward, we propose a deep reinforcement learning (DRL) approach to optimize the optimal policy of each UAV. Finally, we manipulate the optimum reward function for the proposed DRL approach. Simulation results show that the proposed algorithm outperforms the random-walk (RW) baseline model regarding the AoI, scheduling accuracy, and transmission power.
Paper Structure (34 sections, 32 equations, 7 figures, 3 tables, 3 algorithms)

This paper contains 34 sections, 32 equations, 7 figures, 3 tables, 3 algorithms.

Figures (7)

  • Figure 1: The system model: IoT devices are served by multiple UAVs that relay the information to the BS located at the center of the grid world.
  • Figure 2: The activation of $D$ devices is modeled as a Markovian arrival of $K$ binary events. If an event $k$ is active, it influences a device $d$ with an activation probability of $p_{dk}$.
  • Figure 3: The stages of the proposed algorithm.
  • Figure 4: Flow chart of the proposed algorithm. First, the BS estimates the traffic using the LSTM or the FA in the traffic estimation stage. Then, the optimal policy of the UAVs is optimized in the UAV learning stage.
  • Figure 5: (a) MSE of the FA activation probability prediction and training and validation losses of LSTM. (b) Trajectory path of 2-UAVs serving a network of $D = 10$ devices using the LSTM as the traffic predictor. The values for $\zeta_1$ and $\zeta_2$ are $25$ and $500$, respectively. The lower devices have a higher activation probability than the rest of the devices.
  • ...and 2 more figures