Table of Contents
Fetching ...

Towards Cognitive Exploration through Deep Reinforcement Learning for Mobile Robots

Lei Tai, Ming Liu

TL;DR

The paper tackles autonomous exploration for mobile robots in unknown indoor environments by using raw depth data as input to an end-to-end deep reinforcement learning framework. It initializes the DRL network from a CNN trained on real-world data and trains online in Gazebo-based simulations with collision-based rewards, enabling adaptation to unseen scenes without labeling. Results show that the DRL approach outperforms CNN-based supervised and traditional RL baselines and can transfer from simulation to real-world settings, with receptive-field visualization providing interpretability of traversability decisions. The study suggests promising directions, including incorporating RGB inputs and modern CNN backbones to further enhance perception and control in broader environments.

Abstract

Exploration in an unknown environment is the core functionality for mobile robots. Learning-based exploration methods, including convolutional neural networks, provide excellent strategies without human-designed logic for the feature extraction. But the conventional supervised learning algorithms cost lots of efforts on the labeling work of datasets inevitably. Scenes not included in the training set are mostly unrecognized either. We propose a deep reinforcement learning method for the exploration of mobile robots in an indoor environment with the depth information from an RGB-D sensor only. Based on the Deep Q-Network framework, the raw depth image is taken as the only input to estimate the Q values corresponding to all moving commands. The training of the network weights is end-to-end. In arbitrarily constructed simulation environments, we show that the robot can be quickly adapted to unfamiliar scenes without any man-made labeling. Besides, through analysis of receptive fields of feature representations, deep reinforcement learning motivates the convolutional networks to estimate the traversability of the scenes. The test results are compared with the exploration strategies separately based on deep learning or reinforcement learning. Even trained only in the simulated environment, experimental results in real-world environment demonstrate that the cognitive ability of robot controller is dramatically improved compared with the supervised method. We believe it is the first time that raw sensor information is used to build cognitive exploration strategy for mobile robots through end-to-end deep reinforcement learning.

Towards Cognitive Exploration through Deep Reinforcement Learning for Mobile Robots

TL;DR

The paper tackles autonomous exploration for mobile robots in unknown indoor environments by using raw depth data as input to an end-to-end deep reinforcement learning framework. It initializes the DRL network from a CNN trained on real-world data and trains online in Gazebo-based simulations with collision-based rewards, enabling adaptation to unseen scenes without labeling. Results show that the DRL approach outperforms CNN-based supervised and traditional RL baselines and can transfer from simulation to real-world settings, with receptive-field visualization providing interpretability of traversability decisions. The study suggests promising directions, including incorporating RGB inputs and modern CNN backbones to further enhance perception and control in broader environments.

Abstract

Exploration in an unknown environment is the core functionality for mobile robots. Learning-based exploration methods, including convolutional neural networks, provide excellent strategies without human-designed logic for the feature extraction. But the conventional supervised learning algorithms cost lots of efforts on the labeling work of datasets inevitably. Scenes not included in the training set are mostly unrecognized either. We propose a deep reinforcement learning method for the exploration of mobile robots in an indoor environment with the depth information from an RGB-D sensor only. Based on the Deep Q-Network framework, the raw depth image is taken as the only input to estimate the Q values corresponding to all moving commands. The training of the network weights is end-to-end. In arbitrarily constructed simulation environments, we show that the robot can be quickly adapted to unfamiliar scenes without any man-made labeling. Besides, through analysis of receptive fields of feature representations, deep reinforcement learning motivates the convolutional networks to estimate the traversability of the scenes. The test results are compared with the exploration strategies separately based on deep learning or reinforcement learning. Even trained only in the simulated environment, experimental results in real-world environment demonstrate that the cognitive ability of robot controller is dramatically improved compared with the supervised method. We believe it is the first time that raw sensor information is used to build cognitive exploration strategy for mobile robots through end-to-end deep reinforcement learning.

Paper Structure

This paper contains 15 sections, 2 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: The simulated indoor environment implemented in Gazebo with various scenes and obstacles. The turtlebot is the experimental agent with a kinect camera mounted on it. One of the 12 locations marked in the figure is randomly set as the start point in each training episode. The red arrow of every location represents the initial moving direction.
  • Figure 2: The network structure for the actor-evaluation estimation. It's a combination of the convolutional networks for feature extraction and the fullyconnected layers for policy learning. They have been separately proven to be effective in our previous work tai2016deeptl_rcar_2016.
  • Figure 3: The loss decreasing curve as the training iterating. There is a batch of 32 samples used to do back-propagation in every iteration step.
  • Figure 4: Heatmaps of the trajectory points' locations in the 10 test episodes of each model for all 12 start points. The counts of points in every map grid is normalized to [0,1]. Note that the circles at the left-bottom corner of (b) and the middle of (c) are actually a stack of circular trajectories caused by the actual motion of the robot.
  • Figure 5: The receptive fields of the feature representations extracted by convolutional neural networks in both simulated environmental samples and real world samples. The purple area marked on the raw depth image represents the highest $10\%$ activation values. Both of the supervised learning model (SL) and the 7500-iteration deep reinforcement learning model (DRL) are compared. The arrow at the bottom of each receptive image is the chosen moving command based on evaluations listed in Table. \ref{['tab:table_score']}. The left column shows the RGB images taken from the same scenes for references.