Table of Contents
Fetching ...

Learning to Recharge: UAV Coverage Path Planning through Deep Reinforcement Learning

Mirco Theile, Harald Bayerlein, Marco Caccamo, Alberto L. Sangiovanni-Vincentelli

TL;DR

This work proposes a novel proximal policy optimization (PPO)-based deep reinforcement learning (DRL) approach with map-based observations, utilizing action masking and discount factor scheduling to optimize coverage trajectories over the entire mission horizon.

Abstract

Coverage path planning (CPP) is a critical problem in robotics, where the goal is to find an efficient path that covers every point in an area of interest. This work addresses the power-constrained CPP problem with recharge for battery-limited unmanned aerial vehicles (UAVs). In this problem, a notable challenge emerges from integrating recharge journeys into the overall coverage strategy, highlighting the intricate task of making strategic, long-term decisions. We propose a novel proximal policy optimization (PPO)-based deep reinforcement learning (DRL) approach with map-based observations, utilizing action masking and discount factor scheduling to optimize coverage trajectories over the entire mission horizon. We further provide the agent with a position history to handle emergent state loops caused by the recharge capability. Our approach outperforms a baseline heuristic, generalizes to different target zones and maps, with limited generalization to unseen maps. We offer valuable insights into DRL algorithm design for long-horizon problems and provide a publicly available software framework for the CPP problem.

Learning to Recharge: UAV Coverage Path Planning through Deep Reinforcement Learning

TL;DR

This work proposes a novel proximal policy optimization (PPO)-based deep reinforcement learning (DRL) approach with map-based observations, utilizing action masking and discount factor scheduling to optimize coverage trajectories over the entire mission horizon.

Abstract

Coverage path planning (CPP) is a critical problem in robotics, where the goal is to find an efficient path that covers every point in an area of interest. This work addresses the power-constrained CPP problem with recharge for battery-limited unmanned aerial vehicles (UAVs). In this problem, a notable challenge emerges from integrating recharge journeys into the overall coverage strategy, highlighting the intricate task of making strategic, long-term decisions. We propose a novel proximal policy optimization (PPO)-based deep reinforcement learning (DRL) approach with map-based observations, utilizing action masking and discount factor scheduling to optimize coverage trajectories over the entire mission horizon. We further provide the agent with a position history to handle emergent state loops caused by the recharge capability. Our approach outperforms a baseline heuristic, generalizes to different target zones and maps, with limited generalization to unseen maps. We offer valuable insights into DRL algorithm design for long-horizon problems and provide a publicly available software framework for the CPP problem.
Paper Structure (27 sections, 22 equations, 12 figures, 4 tables, 1 algorithm)

This paper contains 27 sections, 22 equations, 12 figures, 4 tables, 1 algorithm.

Figures (12)

  • Figure 1: Example state of a UAV in a coverage path planning grid-world problem on the left, showing the covered area, trajectory, and field of view, with a legend on the right.
  • Figure 2: Two scenarios in which the agent is stuck in infinite loops. By clicking on the images, a link to a video can be opened that shows the behavior.
  • Figure 3: Map-based processing pipeline and neural network architecture with to-scale relative spatial dimensions.
  • Figure 4: All maps listed in Table \ref{['tab:maps']}, sorted by size.
  • Figure 5: Training curve using different action masks showing the median and min-max ranges of three agent training runs per masking approach.
  • ...and 7 more figures