Learning to Recharge: UAV Coverage Path Planning through Deep Reinforcement Learning

Mirco Theile; Harald Bayerlein; Marco Caccamo; Alberto L. Sangiovanni-Vincentelli

Learning to Recharge: UAV Coverage Path Planning through Deep Reinforcement Learning

Mirco Theile, Harald Bayerlein, Marco Caccamo, Alberto L. Sangiovanni-Vincentelli

TL;DR

This work proposes a novel proximal policy optimization (PPO)-based deep reinforcement learning (DRL) approach with map-based observations, utilizing action masking and discount factor scheduling to optimize coverage trajectories over the entire mission horizon.

Abstract

Coverage path planning (CPP) is a critical problem in robotics, where the goal is to find an efficient path that covers every point in an area of interest. This work addresses the power-constrained CPP problem with recharge for battery-limited unmanned aerial vehicles (UAVs). In this problem, a notable challenge emerges from integrating recharge journeys into the overall coverage strategy, highlighting the intricate task of making strategic, long-term decisions. We propose a novel proximal policy optimization (PPO)-based deep reinforcement learning (DRL) approach with map-based observations, utilizing action masking and discount factor scheduling to optimize coverage trajectories over the entire mission horizon. We further provide the agent with a position history to handle emergent state loops caused by the recharge capability. Our approach outperforms a baseline heuristic, generalizes to different target zones and maps, with limited generalization to unseen maps. We offer valuable insights into DRL algorithm design for long-horizon problems and provide a publicly available software framework for the CPP problem.

Learning to Recharge: UAV Coverage Path Planning through Deep Reinforcement Learning

TL;DR

Abstract

Paper Structure (27 sections, 22 equations, 12 figures, 4 tables, 1 algorithm)

This paper contains 27 sections, 22 equations, 12 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Problem Formulation
UAV Grid-World
Coverage Path Planning
Power-constrained CPP without recharge
Power-constrained CPP with recharge
Methodology
Action Masking through Safety Modeling
Reward Function and Discount Factor
Position History
Global-Local Observation
Neural Network Architecture
Evaluation Setup
Heuristic for Comparison
...and 12 more sections

Figures (12)

Figure 1: Example state of a UAV in a coverage path planning grid-world problem on the left, showing the covered area, trajectory, and field of view, with a legend on the right.
Figure 2: Two scenarios in which the agent is stuck in infinite loops. By clicking on the images, a link to a video can be opened that shows the behavior.
Figure 3: Map-based processing pipeline and neural network architecture with to-scale relative spatial dimensions.
Figure 4: All maps listed in Table \ref{['tab:maps']}, sorted by size.
Figure 5: Training curve using different action masks showing the median and min-max ranges of three agent training runs per masking approach.
...and 7 more figures

Learning to Recharge: UAV Coverage Path Planning through Deep Reinforcement Learning

TL;DR

Abstract

Learning to Recharge: UAV Coverage Path Planning through Deep Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (12)