Strategic Coordination of Drones via Short-term Distributed Optimization and Long-term Reinforcement Learning
Chuhao Qin, Evangelos Pournaras
TL;DR
The paper tackles autonomous task allocation for drone swarms in dynamic spatio-temporal environments, framing it as a decentralized NP-hard optimization. It proposes DO-RL, a hybrid approach that combines long-term DRL for high-level flight strategies with short-term, decentralized collective learning for navigation and sensing coordination, using a privacy-preserving tree topology. Empirical results on realistic urban traffic data show DO-RL outperforms both EPOS and MAPPO baselines in overall performance, while maintaining energy efficiency and robust operation across varied drone densities, periods, and vehicle densities. The work highlights the complementary strengths of short-term optimization and long-term learning, enabling energy-efficient, accurate, and scalable traffic monitoring by drone swarms, and provides open-source code and datasets for further research.
Abstract
This paper addresses the problem of autonomous task allocation by a swarm of autonomous, interactive drones in large-scale, dynamic spatio-temporal environments. When each drone independently determines navigation, sensing, and recharging options to choose from such that system-wide sensing requirements are met, the collective decision-making becomes an NP-hard decentralized combinatorial optimization problem. Existing solutions face significant limitations: distributed optimization methods such as collective learning often lack long-term adaptability, while centralized deep reinforcement learning (DRL) suffers from high computational complexity, scalability and privacy concerns. To overcome these challenges, we propose a novel hybrid optimization approach that combines long-term DRL with short-term collective learning. In this approach, each drone uses DRL methods to proactively determine high-level strategies, such as flight direction and recharging behavior, while leveraging collective learning to coordinate short-term sensing and navigation tasks with other drones in a decentralized manner. Extensive experiments using datasets derived from realistic urban mobility demonstrate that the proposed solution outperforms standalone state-of-the-art collective learning and DRL approaches by $27.83\%$ and $23.17\%$ respectively. Our findings highlight the complementary strengths of short-term and long-term decision-making, enabling energy-efficient, accurate, and sustainable traffic monitoring through swarms of drones.
