Drone Swarm Energy Management
Michael Z. Zgurovsky, Pavlo O. Kasyanov, Liliia S. Paliichuk
TL;DR
This work addresses energy management for a drone swarm operating under partial observability by formulating the problem as a Gaussian POMDP and solving it with a POMDP-DDPG framework. It leverages Gaussian belief reduction to convert the infinite-dimensional belief space into a 1D MDP over the belief mean $\hat{x}_t^{(i)}$, with a deterministic variance sequence $\\sigma_t^2$, enabling an efficient Kalman-filter-based decision process. The authors establish that optimal policies admit a time-varying $(s_t,S_t)$ structure, where drones recharge to $S_t$ whenever $\\hat{x}_t^{(i)}$ falls below $s_t$, and demonstrate four DDPG variants to exploit this structure. Empirical results show that DDPG on belief means achieves near-optimal performance with substantial training-time reductions (roughly 33%–64% faster than baselines) and scales nearly linearly with swarm size, enabling real-time, decentralized energy-aware control for large fleets with robust handling of sensor noise. These findings have practical implications for secure, efficient, and scalable autonomous operations in security, environmental monitoring, and infrastructure inspection scenarios, underpinned by strong theoretical guarantees about complexity and policy structure.
Abstract
This note presents an analytical framework for decision-making in drone swarm systems operating under uncertainty, based on the integration of Partially Observable Markov Decision Processes (POMDP) with Deep Deterministic Policy Gradient (DDPG) reinforcement learning. The proposed approach enables adaptive control and cooperative behavior of unmanned aerial vehicles (UAVs) within a cognitive AI platform, where each agent learns optimal energy management and navigation policies from dynamic environmental states. We extend the standard DDPG architecture with a belief-state representation derived from Bayesian filtering, allowing for robust decision-making in partially observable environments. In this paper, for the Gaussian case, we numerically compare the performance of policies derived from DDPG to optimal policies for discretized versions of the original continuous problem. Simulation results demonstrate that the POMDP-DDPG-based swarm control model significantly improves mission success rates and energy efficiency compared to baseline methods. The developed framework supports distributed learning and decision coordination across multiple agents, providing a foundation for scalable cognitive swarm autonomy. The outcomes of this research contribute to the advancement of energy-aware control algorithms for intelligent multi-agent systems and can be applied in security, environmental monitoring, and infrastructure inspection scenarios.
