Table of Contents
Fetching ...

Target Search and Navigation in Heterogeneous Robot Systems with Deep Reinforcement Learning

Yun Chen, Jiaping Xiao

TL;DR

The paper tackles target search and navigation in unknown mine-like environments using a heterogeneous UAV-UGV system. It introduces a multi-stage reinforcement learning framework based on PPO2 combined with an Intrinsic Curiosity Module to overcome sparse rewards and enable collaboration, training UAV first and then UGV while UAV continues to learn. Key contributions include a curriculum-like training procedure, a detailed action/observation/reward design for both UAV and UGV, and empirical evidence showing improved training efficiency and robust inference performance in Unity3D compared to baselines. The approach has practical implications for autonomous SAR missions in complex, unmapped terrains, offering a path toward robust aerial-ground coordination without reliance on pre-existing maps or target-location information.

Abstract

Collaborative heterogeneous robot systems can greatly improve the efficiency of target search and navigation tasks. In this paper, we design a heterogeneous robot system consisting of a UAV and a UGV for search and rescue missions in unknown environments. The system is able to search for targets and navigate to them in a maze-like mine environment with the policies learned through deep reinforcement learning algorithms. During the training process, if two robots are trained simultaneously, the rewards related to their collaboration may not be properly obtained. Hence, we introduce a multi-stage reinforcement learning framework and a curiosity module to encourage agents to explore unvisited environments. Experiments in simulation environments show that our framework can train the heterogeneous robot system to achieve the search and navigation with unknown target locations while existing baselines may not, and accelerate the training speed.

Target Search and Navigation in Heterogeneous Robot Systems with Deep Reinforcement Learning

TL;DR

The paper tackles target search and navigation in unknown mine-like environments using a heterogeneous UAV-UGV system. It introduces a multi-stage reinforcement learning framework based on PPO2 combined with an Intrinsic Curiosity Module to overcome sparse rewards and enable collaboration, training UAV first and then UGV while UAV continues to learn. Key contributions include a curriculum-like training procedure, a detailed action/observation/reward design for both UAV and UGV, and empirical evidence showing improved training efficiency and robust inference performance in Unity3D compared to baselines. The approach has practical implications for autonomous SAR missions in complex, unmapped terrains, offering a path toward robust aerial-ground coordination without reliance on pre-existing maps or target-location information.

Abstract

Collaborative heterogeneous robot systems can greatly improve the efficiency of target search and navigation tasks. In this paper, we design a heterogeneous robot system consisting of a UAV and a UGV for search and rescue missions in unknown environments. The system is able to search for targets and navigate to them in a maze-like mine environment with the policies learned through deep reinforcement learning algorithms. During the training process, if two robots are trained simultaneously, the rewards related to their collaboration may not be properly obtained. Hence, we introduce a multi-stage reinforcement learning framework and a curiosity module to encourage agents to explore unvisited environments. Experiments in simulation environments show that our framework can train the heterogeneous robot system to achieve the search and navigation with unknown target locations while existing baselines may not, and accelerate the training speed.
Paper Structure (23 sections, 14 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 23 sections, 14 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Top view of the designed simulation environment for search and rescue in underground mine scenario. The black lines denote the wall and the sphere-represented victim randomly appears in one of the two branches during the environment generation.
  • Figure 2: Architecture for the ICM.
  • Figure 3: Architecture of the PPO2-ICM model for the policy training.
  • Figure 4: Cumulative reward for the model with/without ICM module. Pink is with the ICM module while black is without the ICM module.
  • Figure 5: (a) Cumulative reward of UAV. (b) Cumulative reward of UGV.
  • ...and 1 more figures