Table of Contents
Fetching ...

Crowd-Robot Interaction: Crowd-aware Robot Navigation with Attention-based Deep Reinforcement Learning

Changan Chen, Yuejiang Liu, Sven Kreiss, Alexandre Alahi

TL;DR

This work proposes to rethink pairwise interactions with a self-attention mechanism, and jointly model Human-Robot as well as Human-Human interactions in the deep reinforcement learning framework, and captures the Human- human interactions occurring in dense crowds that indirectly affects the robot’s anticipation capability.

Abstract

Mobility in an effective and socially-compliant manner is an essential yet challenging task for robots operating in crowded spaces. Recent works have shown the power of deep reinforcement learning techniques to learn socially cooperative policies. However, their cooperation ability deteriorates as the crowd grows since they typically relax the problem as a one-way Human-Robot interaction problem. In this work, we want to go beyond first-order Human-Robot interaction and more explicitly model Crowd-Robot Interaction (CRI). We propose to (i) rethink pairwise interactions with a self-attention mechanism, and (ii) jointly model Human-Robot as well as Human-Human interactions in the deep reinforcement learning framework. Our model captures the Human-Human interactions occurring in dense crowds that indirectly affects the robot's anticipation capability. Our proposed attentive pooling mechanism learns the collective importance of neighboring humans with respect to their future states. Various experiments demonstrate that our model can anticipate human dynamics and navigate in crowds with time efficiency, outperforming state-of-the-art methods.

Crowd-Robot Interaction: Crowd-aware Robot Navigation with Attention-based Deep Reinforcement Learning

TL;DR

This work proposes to rethink pairwise interactions with a self-attention mechanism, and jointly model Human-Robot as well as Human-Human interactions in the deep reinforcement learning framework, and captures the Human- human interactions occurring in dense crowds that indirectly affects the robot’s anticipation capability.

Abstract

Mobility in an effective and socially-compliant manner is an essential yet challenging task for robots operating in crowded spaces. Recent works have shown the power of deep reinforcement learning techniques to learn socially cooperative policies. However, their cooperation ability deteriorates as the crowd grows since they typically relax the problem as a one-way Human-Robot interaction problem. In this work, we want to go beyond first-order Human-Robot interaction and more explicitly model Crowd-Robot Interaction (CRI). We propose to (i) rethink pairwise interactions with a self-attention mechanism, and (ii) jointly model Human-Robot as well as Human-Human interactions in the deep reinforcement learning framework. Our model captures the Human-Human interactions occurring in dense crowds that indirectly affects the robot's anticipation capability. Our proposed attentive pooling mechanism learns the collective importance of neighboring humans with respect to their future states. Various experiments demonstrate that our model can anticipate human dynamics and navigate in crowds with time efficiency, outperforming state-of-the-art methods.

Paper Structure

This paper contains 19 sections, 10 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: In this work, we present a method that jointly model Human-Robot and Human-Human interactions for navigation in crowds.
  • Figure 2: Overview of our method for socially attentive navigation made of 3 modules: Interaction, Pooling, and Planning described in Section \ref{['sec:approach']}. Interactions between the robot and each human are extracted from the interaction module and subsequently aggregated in the pooling module. The planning module estimates the value of the joint state of the robot and humans for navigation in crowds.
  • Figure 3: Illustration of our interaction module. We use a multi-layer perceptron to extract the pairwise interaction feature between the robot and each human $i$. The impact of the other people on the human $i$ is represented by a local map.
  • Figure 4: Architecture of our pooling module. We use a multi-layer perceptron to compute the attention score for each person from the individual embedding vector together with the mean embedding vector. The final joint representation is a weighted sum of the pairwise interactions.
  • Figure 5: Trajectory comparison in an invisible test case. Circles are the positions of agents at the labeled times. When encountering humans, CADRL and LSTM-RL demonstrate overly aggressive and conservative behaviors respectively. In contrast, our SARL and LM-SARL successfully identify a shortcut through the center, which allows the robot to keep some distance from others while navigating to the goal quickly.
  • ...and 2 more figures