Table of Contents
Fetching ...

Benchmarking Deep Reinforcement Learning for Navigation in Denied Sensor Environments

Mariusz Wisniewski, Paraskevas Chatzithanos, Weisi Guo, Antonios Tsourdos

TL;DR

A benchmark of both well-used and emerging DRL algorithms in two navigation tasks - Lidar + position, and vision end-to-end - with configurable sensor denial effects and the usage of adversarial training is presented.

Abstract

Deep Reinforcement learning (DRL) is used to enable autonomous navigation in unknown environments. Most research assume perfect sensor data, but real-world environments may contain natural and artificial sensor noise and denial. Here, we present a benchmark of both well-used and emerging DRL algorithms in a navigation task with configurable sensor denial effects. In particular, we are interested in comparing how different DRL methods (e.g. model-free PPO vs. model-based DreamerV3) are affected by sensor denial. We show that DreamerV3 outperforms other methods in the visual end-to-end navigation task with a dynamic goal - and other methods are not able to learn this. Furthermore, DreamerV3 generally outperforms other methods in sensor-denied environments. In order to improve robustness, we use adversarial training and demonstrate an improved performance in denied environments, although this generally comes with a performance cost on the vanilla environments. We anticipate this benchmark of different DRL methods and the usage of adversarial training to be a starting point for the development of more elaborate navigation strategies that are capable of dealing with uncertain and denied sensor readings.

Benchmarking Deep Reinforcement Learning for Navigation in Denied Sensor Environments

TL;DR

A benchmark of both well-used and emerging DRL algorithms in two navigation tasks - Lidar + position, and vision end-to-end - with configurable sensor denial effects and the usage of adversarial training is presented.

Abstract

Deep Reinforcement learning (DRL) is used to enable autonomous navigation in unknown environments. Most research assume perfect sensor data, but real-world environments may contain natural and artificial sensor noise and denial. Here, we present a benchmark of both well-used and emerging DRL algorithms in a navigation task with configurable sensor denial effects. In particular, we are interested in comparing how different DRL methods (e.g. model-free PPO vs. model-based DreamerV3) are affected by sensor denial. We show that DreamerV3 outperforms other methods in the visual end-to-end navigation task with a dynamic goal - and other methods are not able to learn this. Furthermore, DreamerV3 generally outperforms other methods in sensor-denied environments. In order to improve robustness, we use adversarial training and demonstrate an improved performance in denied environments, although this generally comes with a performance cost on the vanilla environments. We anticipate this benchmark of different DRL methods and the usage of adversarial training to be a starting point for the development of more elaborate navigation strategies that are capable of dealing with uncertain and denied sensor readings.

Paper Structure

This paper contains 24 sections, 23 figures, 3 tables.

Figures (23)

  • Figure 1: Example of the DRL-Robot-Nav Maze Environment cimurs_goal-driven_2021. Top-left: top-down view of the environment rendered in Gazebo. Bottom-left: the view from the RGB camera. Right: RViz visualization of the discretized Lidar sensor readings, the robot, and the noisy sensor areas (blue: camera noise area, red: Lidar noise area). The pink sphere (top left) is added as a visual cue to mark the goal for vision-based policies.
  • Figure 2: Experimental diagram showing the process of the presented experiments. The environment consists of clean, camera noise, and Lidar noise variants. The environment outputs the observation in form of two modalities, Lidar and camera. Multiple algorithms are then trained to navigate to the goal whilst avoiding the obstacles.
  • Figure 3: Example of a robot in the Lidar noise area, visualized by RViz. Once inside the red Lidar noise area, the discretized Lidar points have Gaussian noise added to them and no longer provide accurate representations of the environment. Example of a robot in the camera noise area, visualized by RViz. Once inside the blue camera noise area, the image feed (shown on the left) has all pixels turned to black - this simulates a camera failure.
  • Figure 4: Camera and Lidar observation networks. In camera observation models (TD3, PPO, PPO-LSTM) the image is first passed through convolutional layers and flattened to 256 features to reduce the dimensionality of the input. These features are then input to the actor-critic MLP. For the Lidar observation models (TD3, PPO), the input is already lowly dimensional and so the 20 Lidar readings are concatenated with the distance to goal, angle to goal, and previous actions and input to the MLP.
  • Figure 5: Training and evaluation scenarios for Lidar and camera sensors. For Lidar scenarios, the models are trained on a variety of 0x0 (vanilla) maps and maps with varying amounts of sensor perturbation: 3x3, 5x5, and 7x7 metre sensor noise zones. These zones contain Gaussian noise which is added to the sensor readings. These trained policies are then evaluated on the same map. Likewise, for camera scenarios the models are trained on a variety of 0x0 (vanilla) maps and maps with varying amounts of sensor perturbation: 3x3, 5x5, and 7x7 metre sensor denial zones. Inside these zones, the camera sensor completely fails (all pixels turn to black). The models are trained on the default map, with a static goal in the middle, and the blue areas in which the camera completely fails. These policies are then evaluated on the default map with varying degree of noise. A second evaluation is performed on a map with no obstacles, to better understand the learned policies in sensor denied areas.
  • ...and 18 more figures