Table of Contents
Fetching ...

Scheduling Drone and Mobile Charger via Hybrid-Action Deep Reinforcement Learning

Jizhe Dou, Haotian Zhang, Guodong Sun

TL;DR

The paper addresses the problem of scheduling a drone and a mobile charger to maximize observation utility while minimizing task duration in a hybrid discrete-continuous action space. It introduces HaDMC, a hybrid-action deep reinforcement learning framework that learns a continuous latent policy (via TD3) and decodes it into joint drone and charger actions using an embedding table for discrete decisions and an adversarial autoencoder for continuous timings, with a mutual-learning pre-training phase to foster cooperation. A carefully designed reward structure guides learning toward efficient observation, timely charging, and successful task completion, including lookahead incentives and penalties for failure. Experiments across diverse deployments show that HaDMC outperforms baselines and ablations, demonstrating the value of latent-action representation and inter-agent cooperation in hybrid-action scheduling, with practical implications for extended-endurance drone operations in real-world monitoring tasks.

Abstract

Recently there has been a growing interest in industry and academia, regarding the use of wireless chargers to prolong the operational longevity of unmanned aerial vehicles (commonly knowns as drones). In this paper we consider a charger-assisted drone application: a drone is deployed to observe a set points of interest, while a charger can move to recharge the drone's battery. We focus on the route and charging schedule of the drone and the mobile charger, to obtain high observation utility with the shortest possible time, while ensuring the drone remains operational during task execution. Essentially, this proposed drone-charger scheduling problem is a multi-stage decision-making process, in which the drone and the mobile charger act as two agents who cooperate to finish a task. The discrete-continuous hybrid action space of the two agents poses a significant challenge in our problem. To address this issue, we present a hybrid-action deep reinforcement learning framework, called HaDMC, which uses a standard policy learning algorithm to generate latent continuous actions. Motivated by representation learning, we specifically design and train an action decoder. It involves two pipelines to convert the latent continuous actions into original discrete and continuous actions, by which the drone and the charger can directly interact with environment. We embed a mutual learning scheme in model training, emphasizing the collaborative rather than individual actions. We conduct extensive numerical experiments to evaluate HaDMC and compare it with state-of-the-art deep reinforcement learning approaches. The experimental results show the effectiveness and efficiency of our solution.

Scheduling Drone and Mobile Charger via Hybrid-Action Deep Reinforcement Learning

TL;DR

The paper addresses the problem of scheduling a drone and a mobile charger to maximize observation utility while minimizing task duration in a hybrid discrete-continuous action space. It introduces HaDMC, a hybrid-action deep reinforcement learning framework that learns a continuous latent policy (via TD3) and decodes it into joint drone and charger actions using an embedding table for discrete decisions and an adversarial autoencoder for continuous timings, with a mutual-learning pre-training phase to foster cooperation. A carefully designed reward structure guides learning toward efficient observation, timely charging, and successful task completion, including lookahead incentives and penalties for failure. Experiments across diverse deployments show that HaDMC outperforms baselines and ablations, demonstrating the value of latent-action representation and inter-agent cooperation in hybrid-action scheduling, with practical implications for extended-endurance drone operations in real-world monitoring tasks.

Abstract

Recently there has been a growing interest in industry and academia, regarding the use of wireless chargers to prolong the operational longevity of unmanned aerial vehicles (commonly knowns as drones). In this paper we consider a charger-assisted drone application: a drone is deployed to observe a set points of interest, while a charger can move to recharge the drone's battery. We focus on the route and charging schedule of the drone and the mobile charger, to obtain high observation utility with the shortest possible time, while ensuring the drone remains operational during task execution. Essentially, this proposed drone-charger scheduling problem is a multi-stage decision-making process, in which the drone and the mobile charger act as two agents who cooperate to finish a task. The discrete-continuous hybrid action space of the two agents poses a significant challenge in our problem. To address this issue, we present a hybrid-action deep reinforcement learning framework, called HaDMC, which uses a standard policy learning algorithm to generate latent continuous actions. Motivated by representation learning, we specifically design and train an action decoder. It involves two pipelines to convert the latent continuous actions into original discrete and continuous actions, by which the drone and the charger can directly interact with environment. We embed a mutual learning scheme in model training, emphasizing the collaborative rather than individual actions. We conduct extensive numerical experiments to evaluate HaDMC and compare it with state-of-the-art deep reinforcement learning approaches. The experimental results show the effectiveness and efficiency of our solution.
Paper Structure (32 sections, 13 equations, 14 figures, 3 tables, 1 algorithm)

This paper contains 32 sections, 13 equations, 14 figures, 3 tables, 1 algorithm.

Figures (14)

  • Figure 1: Demonstration of our drone-charger scenario involving seven PoIs (marked as $p_i$) and three charging points (marked as $c_j$). The solid and dashed arrow lines represent the walks of the drone and the charger, respectively.
  • Figure 2: Four cases where the rewards are calculated based on the specific states and actions.
  • Figure 3: Basic idea of the proposed HaDMC approach.
  • Figure 4: Overall architecture of the learning model of HaDMC.
  • Figure 5: Demonstration of our embedding table in converting latent continuous action $\bm{z}$ to discrete action $a^{\rm dis}$.
  • ...and 9 more figures