Scheduling Drone and Mobile Charger via Hybrid-Action Deep Reinforcement Learning
Jizhe Dou, Haotian Zhang, Guodong Sun
TL;DR
The paper addresses the problem of scheduling a drone and a mobile charger to maximize observation utility while minimizing task duration in a hybrid discrete-continuous action space. It introduces HaDMC, a hybrid-action deep reinforcement learning framework that learns a continuous latent policy (via TD3) and decodes it into joint drone and charger actions using an embedding table for discrete decisions and an adversarial autoencoder for continuous timings, with a mutual-learning pre-training phase to foster cooperation. A carefully designed reward structure guides learning toward efficient observation, timely charging, and successful task completion, including lookahead incentives and penalties for failure. Experiments across diverse deployments show that HaDMC outperforms baselines and ablations, demonstrating the value of latent-action representation and inter-agent cooperation in hybrid-action scheduling, with practical implications for extended-endurance drone operations in real-world monitoring tasks.
Abstract
Recently there has been a growing interest in industry and academia, regarding the use of wireless chargers to prolong the operational longevity of unmanned aerial vehicles (commonly knowns as drones). In this paper we consider a charger-assisted drone application: a drone is deployed to observe a set points of interest, while a charger can move to recharge the drone's battery. We focus on the route and charging schedule of the drone and the mobile charger, to obtain high observation utility with the shortest possible time, while ensuring the drone remains operational during task execution. Essentially, this proposed drone-charger scheduling problem is a multi-stage decision-making process, in which the drone and the mobile charger act as two agents who cooperate to finish a task. The discrete-continuous hybrid action space of the two agents poses a significant challenge in our problem. To address this issue, we present a hybrid-action deep reinforcement learning framework, called HaDMC, which uses a standard policy learning algorithm to generate latent continuous actions. Motivated by representation learning, we specifically design and train an action decoder. It involves two pipelines to convert the latent continuous actions into original discrete and continuous actions, by which the drone and the charger can directly interact with environment. We embed a mutual learning scheme in model training, emphasizing the collaborative rather than individual actions. We conduct extensive numerical experiments to evaluate HaDMC and compare it with state-of-the-art deep reinforcement learning approaches. The experimental results show the effectiveness and efficiency of our solution.
