Strategic Coordination of Drones via Short-term Distributed Optimization and Long-term Reinforcement Learning

Chuhao Qin; Evangelos Pournaras

Strategic Coordination of Drones via Short-term Distributed Optimization and Long-term Reinforcement Learning

Chuhao Qin, Evangelos Pournaras

TL;DR

The paper tackles autonomous task allocation for drone swarms in dynamic spatio-temporal environments, framing it as a decentralized NP-hard optimization. It proposes DO-RL, a hybrid approach that combines long-term DRL for high-level flight strategies with short-term, decentralized collective learning for navigation and sensing coordination, using a privacy-preserving tree topology. Empirical results on realistic urban traffic data show DO-RL outperforms both EPOS and MAPPO baselines in overall performance, while maintaining energy efficiency and robust operation across varied drone densities, periods, and vehicle densities. The work highlights the complementary strengths of short-term optimization and long-term learning, enabling energy-efficient, accurate, and scalable traffic monitoring by drone swarms, and provides open-source code and datasets for further research.

Abstract

This paper addresses the problem of autonomous task allocation by a swarm of autonomous, interactive drones in large-scale, dynamic spatio-temporal environments. When each drone independently determines navigation, sensing, and recharging options to choose from such that system-wide sensing requirements are met, the collective decision-making becomes an NP-hard decentralized combinatorial optimization problem. Existing solutions face significant limitations: distributed optimization methods such as collective learning often lack long-term adaptability, while centralized deep reinforcement learning (DRL) suffers from high computational complexity, scalability and privacy concerns. To overcome these challenges, we propose a novel hybrid optimization approach that combines long-term DRL with short-term collective learning. In this approach, each drone uses DRL methods to proactively determine high-level strategies, such as flight direction and recharging behavior, while leveraging collective learning to coordinate short-term sensing and navigation tasks with other drones in a decentralized manner. Extensive experiments using datasets derived from realistic urban mobility demonstrate that the proposed solution outperforms standalone state-of-the-art collective learning and DRL approaches by $27.83\%$ and $23.17\%$ respectively. Our findings highlight the complementary strengths of short-term and long-term decision-making, enabling energy-efficient, accurate, and sustainable traffic monitoring through swarms of drones.

Strategic Coordination of Drones via Short-term Distributed Optimization and Long-term Reinforcement Learning

TL;DR

Abstract

and

respectively. Our findings highlight the complementary strengths of short-term and long-term decision-making, enabling energy-efficient, accurate, and sustainable traffic monitoring through swarms of drones.

Paper Structure (21 sections, 17 equations, 16 figures, 5 tables, 1 algorithm)

This paper contains 21 sections, 17 equations, 16 figures, 5 tables, 1 algorithm.

Introduction
Related Work
System Model
Scenario and assumptions
Problem formulation
The Proposed Solution
Detailed System Design
DRL modeling
Local plan generation
Iterative plan selection
Periodic state update
Multi-agent reinforcement learning
Performance Evaluation
Experimental settings
Baselines and metrics
...and 6 more sections

Figures (16)

Figure 1: The hierarchical decision-making framework of DO-RL. The overall flow of DO-RL is depicted at the bottom; The plan selection is illustrated for two drones; The plan generation part outlines the procedure for generating a plan when drone $1$ is traveling to the east.
Figure 2: Overview of the tree communication topology. During the bottom-up process, Drone 3 aggregates the plans of its children, i.e., drone 1 and drone 2, and sends them to its parent agent with its own plan. During the top-down process, each parent agent sends the total aggregated plans to its children such that all agents obtain the observed plan.
Figure 3: Process of path finding in plan generation.
Figure 4: DRL-based long-term scheduling overview.
Figure 5: The central business district of Munich, Germany: (a) Basic scenario with $64$ cells, $4$ charging stations and high density of vehicles; (b) Increase the number of cells to 100; (c) Increase the number of charging stations to 9; (d) Change to a new map with low density of vehicles.
...and 11 more figures

Strategic Coordination of Drones via Short-term Distributed Optimization and Long-term Reinforcement Learning

TL;DR

Abstract

Strategic Coordination of Drones via Short-term Distributed Optimization and Long-term Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (16)