Deep Reinforcement Learning Enabled Persistent Surveillance with Energy-Aware UAV-UGV Systems for Disaster Management Applications
Md Safwan Mondal, Subramanian Ramasamy, Pranav Bhounsule
TL;DR
This paper tackles energy-constrained, persistent disaster surveillance using a UAV-UGV cooperative system where a mobile UGV refuels a UAV. It introduces a transformer-based DRL policy trained with REINFORCE to jointly plan UAV and UGV routes, including rendezvous points, under an open-ended EVRPTW-like constraint and a time-age based objective. The approach is evaluated against heuristic baselines and a learning-based model across varied problem sizes, distributions, and a Hurricane Harvey 2017 case study, showing superior solution quality, scalability, and adaptability to dynamic changes and priority weighting. The work demonstrates notable improvements in visit frequency of mission points and robust online planning, highlighting the practical potential for real-time disaster response using energy-aware UAV-UGV cooperation.
Abstract
Integrating Unmanned Aerial Vehicles (UAVs) with Unmanned Ground Vehicles (UGVs) provides an effective solution for persistent surveillance in disaster management. UAVs excel at covering large areas rapidly, but their range is limited by battery capacity. UGVs, though slower, can carry larger batteries for extended missions. By using UGVs as mobile recharging stations, UAVs can extend mission duration through periodic refueling, leveraging the complementary strengths of both systems. To optimize this energy-aware UAV-UGV cooperative routing problem, we propose a planning framework that determines optimal routes and recharging points between a UAV and a UGV. Our solution employs a deep reinforcement learning (DRL) framework built on an encoder-decoder transformer architecture with multi-head attention mechanisms. This architecture enables the model to sequentially select actions for visiting mission points and coordinating recharging rendezvous between the UAV and UGV. The DRL model is trained to minimize the age periods (the time gap between consecutive visits) of mission points, ensuring effective surveillance. We evaluate the framework across various problem sizes and distributions, comparing its performance against heuristic methods and an existing learning-based model. Results show that our approach consistently outperforms these baselines in both solution quality and runtime. Additionally, we demonstrate the DRL policy's applicability in a real-world disaster scenario as a case study and explore its potential for online mission planning to handle dynamic changes. Adapting the DRL policy for priority-driven surveillance highlights the model's generalizability for real-time disaster response.
