Table of Contents
Fetching ...

Learning Coverage Paths in Unknown Environments with Deep Reinforcement Learning

Arvi Jonnarth, Jie Zhao, Michael Felsberg

TL;DR

This work tackles online coverage path planning (CPP) in unknown environments by learning continuous control policies with deep reinforcement learning. It introduces egocentric, multi-scale frontier maps and a novel total variation (TV) reward to promote complete coverage and reduce holes, implemented with a scale-grouped CNN and SAC. The approach outperforms previous RL-based methods and specialized CPP algorithms across exploration and lawn mowing variations, while remaining robust to sensor noise and scalable to large environments. The methods enable end-to-end learning of mapping, planning, and navigation, with practical implications for autonomous robotics in partially known settings.

Abstract

Coverage path planning (CPP) is the problem of finding a path that covers the entire free space of a confined area, with applications ranging from robotic lawn mowing to search-and-rescue. When the environment is unknown, the path needs to be planned online while mapping the environment, which cannot be addressed by offline planning methods that do not allow for a flexible path space. We investigate how suitable reinforcement learning is for this challenging problem, and analyze the involved components required to efficiently learn coverage paths, such as action space, input feature representation, neural network architecture, and reward function. We propose a computationally feasible egocentric map representation based on frontiers, and a novel reward term based on total variation to promote complete coverage. Through extensive experiments, we show that our approach surpasses the performance of both previous RL-based approaches and highly specialized methods across multiple CPP variations.

Learning Coverage Paths in Unknown Environments with Deep Reinforcement Learning

TL;DR

This work tackles online coverage path planning (CPP) in unknown environments by learning continuous control policies with deep reinforcement learning. It introduces egocentric, multi-scale frontier maps and a novel total variation (TV) reward to promote complete coverage and reduce holes, implemented with a scale-grouped CNN and SAC. The approach outperforms previous RL-based methods and specialized CPP algorithms across exploration and lawn mowing variations, while remaining robust to sensor noise and scalable to large environments. The methods enable end-to-end learning of mapping, planning, and navigation, with practical implications for autonomous robotics in partially known settings.

Abstract

Coverage path planning (CPP) is the problem of finding a path that covers the entire free space of a confined area, with applications ranging from robotic lawn mowing to search-and-rescue. When the environment is unknown, the path needs to be planned online while mapping the environment, which cannot be addressed by offline planning methods that do not allow for a flexible path space. We investigate how suitable reinforcement learning is for this challenging problem, and analyze the involved components required to efficiently learn coverage paths, such as action space, input feature representation, neural network architecture, and reward function. We propose a computationally feasible egocentric map representation based on frontiers, and a novel reward term based on total variation to promote complete coverage. Through extensive experiments, we show that our approach surpasses the performance of both previous RL-based approaches and highly specialized methods across multiple CPP variations.
Paper Structure (32 sections, 5 equations, 10 figures, 10 tables)

This paper contains 32 sections, 5 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: Learned paths for exploration (left) and lawn mowing (right), including the start (red triangle) and end position (green square).
  • Figure 2: (a) Agent-environment interaction: The observation consists of multi-scale maps from (b) and lidar detections, based on which the model predicts continuous control signals for an agent. (b) Illustration of coverage, obstacle, and frontier maps in multiple scales: This example shows $m=4$ scales with a scale factor of $s=2$. All scales are centered at the agent, and discretized into the same pixel resolution, resulting in the multi-scale maps $M_c$, $M_o$, and $M_f$, of size $8 \times 8 \times 4$ here.
  • Figure 3: The area reward $R_\mathrm{area}$ is based on the maximum possible area that can be covered in each time step.
  • Figure 4: Our proposed SGCNN architecture consists of convolution (CONV) and fully connected (FC) layers. The scales of the multi-scale maps are convolved separately as their spatial positions are not aligned in the grid. x3/x4 refer to the number of layers.
  • Figure 5: Examples of exploration maps (a-c), lawn mowing maps (d-f), and randomly generated maps (g-h).
  • ...and 5 more figures