Table of Contents
Fetching ...

Deep Reinforcement Learning-based Large-scale Robot Exploration

Yuhong Cao, Rui Zhao, Yizhuo Wang, Bairan Xiang, Guillaume Sartoretti

TL;DR

This work proposes a deep reinforcement learning (DRL) based reactive planner to solve large-scale Lidar-based autonomous robot exploration problems in 2D action space that relies on ground truth information and a graph rarefaction algorithm, which allows models trained in small-scale environments to scale to large-scale ones.

Abstract

In this work, we propose a deep reinforcement learning (DRL) based reactive planner to solve large-scale Lidar-based autonomous robot exploration problems in 2D action space. Our DRL-based planner allows the agent to reactively plan its exploration path by making implicit predictions about unknown areas, based on a learned estimation of the underlying transition model of the environment. To this end, our approach relies on learned attention mechanisms for their powerful ability to capture long-term dependencies at different spatial scales to reason about the robot's entire belief over known areas. Our approach relies on ground truth information (i.e., privileged learning) to guide the environment estimation during training, as well as on a graph rarefaction algorithm, which allows models trained in small-scale environments to scale to large-scale ones. Simulation results show that our model exhibits better exploration efficiency (12% in path length, 6% in makespan) and lower planning time (60%) than the state-of-the-art planners in a 130m x 100m benchmark scenario. We also validate our learned model on hardware.

Deep Reinforcement Learning-based Large-scale Robot Exploration

TL;DR

This work proposes a deep reinforcement learning (DRL) based reactive planner to solve large-scale Lidar-based autonomous robot exploration problems in 2D action space that relies on ground truth information and a graph rarefaction algorithm, which allows models trained in small-scale environments to scale to large-scale ones.

Abstract

In this work, we propose a deep reinforcement learning (DRL) based reactive planner to solve large-scale Lidar-based autonomous robot exploration problems in 2D action space. Our DRL-based planner allows the agent to reactively plan its exploration path by making implicit predictions about unknown areas, based on a learned estimation of the underlying transition model of the environment. To this end, our approach relies on learned attention mechanisms for their powerful ability to capture long-term dependencies at different spatial scales to reason about the robot's entire belief over known areas. Our approach relies on ground truth information (i.e., privileged learning) to guide the environment estimation during training, as well as on a graph rarefaction algorithm, which allows models trained in small-scale environments to scale to large-scale ones. Simulation results show that our model exhibits better exploration efficiency (12% in path length, 6% in makespan) and lower planning time (60%) than the state-of-the-art planners in a 130m x 100m benchmark scenario. We also validate our learned model on hardware.
Paper Structure (14 sections, 2 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 14 sections, 2 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: A mobile ground robot exploring an indoor lab environment using our DRL-based planner. Top-left: view from the robot's onboard camera (not used for mapping). Top-right: agent's current belief (map) of the environment, and view of our neural network's inputs. Bottom: 3D point cloud (map) of the environment constructed by the robot. The axes system represents the current position and orientation of the robot, the purple ball the next waypoint output by our planner,
  • Figure 2: Our proposed DRL-based planner. The robot first transforms its current map (point cloud) data to an occupancy grid, and then extracts and rarefies an informative graph from it. After that, this graph (node features and an adjacency edge mask) is fed to our attention-based network (consisting of an encoder and a decoder), which finally outputs a policy over which neighboring node should be the next waypoint.
  • Figure 3: Informative graph (left) and Ground-truth graph (right). The informative graph is the input of our policy network. The ground truth graph is the input of our critic network, which is only used during training to assist the learning of the policy. Nodes are color-coded based on their utility (dark purple to yellow, low to high). The blue trajectory is the path executed so far by the robot (light purple node).
  • Figure 4: Exploration paths comparisons in a large-scale $130m\times 100m$ indoor office simulation.
  • Figure 5: Exploration paths comparisons in a large-scale $150m\times 150m$ outdoor forest simulation.
  • ...and 4 more figures