Meta Reinforcement Learning for Strategic IoT Deployments Coverage in Disaster-Response UAV Swarms
Marwan Dhuheir, Aiman Erbad, Ala Al-Fuqaha
TL;DR
The paper addresses energy-efficient path planning for a dynamic swarm of UAVs tasked with data collection from ground IoT devices in disaster scenarios, emphasizing strategic-location coverage. It formulates an NP-hard optimization problem to minimize total energy while meeting minimum data-rate and time constraints, and proposes a lightweight meta-reinforcement learning solution to enable fast adaptation when UAVs join or leave the swarm. The approach models a detailed wireless channel (LoS/NLoS) and data-delivery delays, and defines an energy model that accounts for operational and communication power plus strategic-location passes. Through simulations in an urban grid, the Meta-RL method outperforms PPO, Actor-Critic, and DQN baselines in terms of faster convergence, higher service satisfaction at strategic locations, and adaptive resilience to swarm dynamics. The work provides a practical framework for robust, energy-conscious IoT data collection in rapidly changing disaster-response environments, with potential impact on real deployments and emergency communications.
Abstract
In the past decade, Unmanned Aerial Vehicles (UAVs) have grabbed the attention of researchers in academia and industry for their potential use in critical emergency applications, such as providing wireless services to ground users and collecting data from areas affected by disasters, due to their advantages in terms of maneuverability and movement flexibility. The UAVs' limited resources, energy budget, and strict mission completion time have posed challenges in adopting UAVs for these applications. Our system model considers a UAV swarm that navigates an area collecting data from ground IoT devices focusing on providing better service for strategic locations and allowing UAVs to join and leave the swarm (e.g., for recharging) in a dynamic way. In this work, we introduce an optimization model with the aim of minimizing the total energy consumption and provide the optimal path planning of UAVs under the constraints of minimum completion time and transmit power. The formulated optimization is NP-hard making it not applicable for real-time decision making. Therefore, we introduce a light-weight meta-reinforcement learning solution that can also cope with sudden changes in the environment through fast convergence. We conduct extensive simulations and compare our approach to three state-of-the-art learning models. Our simulation results prove that our introduced approach is better than the three state-of-the-art algorithms in providing coverage to strategic locations with fast convergence.
