Table of Contents
Fetching ...

Human-Aware Robot Navigation via Reinforcement Learning with Hindsight Experience Replay and Curriculum Learning

Keyu Li, Ye Lu, Max Q. -H. Meng

TL;DR

This work proposes to incorporate the hindsight experience replay (HER) and curriculum learning (CL) techniques with RL to efficiently learn the optimal navigation policy in the dense crowd and demonstrates that the method can effectively learn human-aware navigation without requiring additional demonstration data.

Abstract

In recent years, the growing demand for more intelligent service robots is pushing the development of mobile robot navigation algorithms to allow safe and efficient operation in a dense crowd. Reinforcement learning (RL) approaches have shown superior ability in solving sequential decision making problems, and recent work has explored its potential to learn navigation polices in a socially compliant manner. However, the expert demonstration data used in existing methods is usually expensive and difficult to obtain. In this work, we consider the task of training an RL agent without employing the demonstration data, to achieve efficient and collision-free navigation in a crowded environment. To address the sparse reward navigation problem, we propose to incorporate the hindsight experience replay (HER) and curriculum learning (CL) techniques with RL to efficiently learn the optimal navigation policy in the dense crowd. The effectiveness of our method is validated in a simulated crowd-robot coexisting environment. The results demonstrate that our method can effectively learn human-aware navigation without requiring additional demonstration data.

Human-Aware Robot Navigation via Reinforcement Learning with Hindsight Experience Replay and Curriculum Learning

TL;DR

This work proposes to incorporate the hindsight experience replay (HER) and curriculum learning (CL) techniques with RL to efficiently learn the optimal navigation policy in the dense crowd and demonstrates that the method can effectively learn human-aware navigation without requiring additional demonstration data.

Abstract

In recent years, the growing demand for more intelligent service robots is pushing the development of mobile robot navigation algorithms to allow safe and efficient operation in a dense crowd. Reinforcement learning (RL) approaches have shown superior ability in solving sequential decision making problems, and recent work has explored its potential to learn navigation polices in a socially compliant manner. However, the expert demonstration data used in existing methods is usually expensive and difficult to obtain. In this work, we consider the task of training an RL agent without employing the demonstration data, to achieve efficient and collision-free navigation in a crowded environment. To address the sparse reward navigation problem, we propose to incorporate the hindsight experience replay (HER) and curriculum learning (CL) techniques with RL to efficiently learn the optimal navigation policy in the dense crowd. The effectiveness of our method is validated in a simulated crowd-robot coexisting environment. The results demonstrate that our method can effectively learn human-aware navigation without requiring additional demonstration data.

Paper Structure

This paper contains 19 sections, 6 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: Overview of our proposed framework that uses a deep reinforcement learning (RL) agent trained with hindsight experience replay (HER) techniques and curriculum learning (CL) to achieve human-aware robot navigation in crowded environments.
  • Figure 2: Learning curves of the RL, RL+IL, RL+HER methods when expert demonstration data is used in training.
  • Figure 3: Learning curves of the RL agents (a) without HER and (B) with HER, in an environment with 1 human. Expert demonstration data is not used in training.
  • Figure 4: Learning curves of the (a) RL+RS and (b) RL+HER+CL agents, in an environment with 5 humans. Expert demonstration data is not used in training.
  • Figure 5: Trajectories of the agents trained by (a) the baseline method RL+IL, and by (b) our proposed RL+HER+CL method in an environment with 5 humans. Our method outperforms the baseline in both the travel time and the trajectory length.