Table of Contents
Fetching ...

Learning Crowd Behaviors in Navigation with Attention-based Spatial-Temporal Graphs

Yanying Zhou, Jochen Garcke

TL;DR

This work tackles safe and efficient robot navigation in dynamic crowds by proposing ASTG, an attention-based spatial-temporal graph framework. ASTG models instantaneous spatial relations with a spatial graph and historical dynamics with a temporal graph, both using graph attention networks, and fuses them with a social attention mechanism to form a robust crowd representation for planning. The method demonstrates superior generalization and robustness against baselines across simple and complex crowd scenarios, supported by quantitative metrics and qualitative trajectory analyses. The results suggest significant practical potential for improving human-robot coexistence in real-world crowded environments.

Abstract

Safe and efficient navigation in dynamic environments shared with humans remains an open and challenging task for mobile robots. Previous works have shown the efficacy of using reinforcement learning frameworks to train policies for efficient navigation. However, their performance deteriorates when crowd configurations change, i.e. become larger or more complex. Thus, it is crucial to fully understand the complex, dynamic, and sophisticated interactions of the crowd resulting in proactive and foresighted behaviors for robot navigation. In this paper, a novel deep graph learning architecture based on attention mechanisms is proposed, which leverages the spatial-temporal graph to enhance robot navigation. We employ spatial graphs to capture the current spatial interactions, and through the integration with RNN, the temporal graphs utilize past trajectory information to infer the future intentions of each agent. The spatial-temporal graph reasoning ability allows the robot to better understand and interpret the relationships between agents over time and space, thereby making more informed decisions. Compared to previous state-of-the-art methods, our method demonstrates superior robustness in terms of safety, efficiency, and generalization in various challenging scenarios.

Learning Crowd Behaviors in Navigation with Attention-based Spatial-Temporal Graphs

TL;DR

This work tackles safe and efficient robot navigation in dynamic crowds by proposing ASTG, an attention-based spatial-temporal graph framework. ASTG models instantaneous spatial relations with a spatial graph and historical dynamics with a temporal graph, both using graph attention networks, and fuses them with a social attention mechanism to form a robust crowd representation for planning. The method demonstrates superior generalization and robustness against baselines across simple and complex crowd scenarios, supported by quantitative metrics and qualitative trajectory analyses. The results suggest significant practical potential for improving human-robot coexistence in real-world crowded environments.

Abstract

Safe and efficient navigation in dynamic environments shared with humans remains an open and challenging task for mobile robots. Previous works have shown the efficacy of using reinforcement learning frameworks to train policies for efficient navigation. However, their performance deteriorates when crowd configurations change, i.e. become larger or more complex. Thus, it is crucial to fully understand the complex, dynamic, and sophisticated interactions of the crowd resulting in proactive and foresighted behaviors for robot navigation. In this paper, a novel deep graph learning architecture based on attention mechanisms is proposed, which leverages the spatial-temporal graph to enhance robot navigation. We employ spatial graphs to capture the current spatial interactions, and through the integration with RNN, the temporal graphs utilize past trajectory information to infer the future intentions of each agent. The spatial-temporal graph reasoning ability allows the robot to better understand and interpret the relationships between agents over time and space, thereby making more informed decisions. Compared to previous state-of-the-art methods, our method demonstrates superior robustness in terms of safety, efficiency, and generalization in various challenging scenarios.
Paper Structure (19 sections, 16 equations, 7 figures, 2 tables)

This paper contains 19 sections, 16 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Illustration of our work. Our model uses spatial-temporal graphs to capture complex crowd dynamics, which aggregates both spatial and temporal attention maps from each agent.
  • Figure 2: Network architecture from Section \ref{['sec:method']}. (a) The spatial graph utilizes the GAT to encode direct and indirect spatial interactions between agents. (b) The temporal graph incorporates an RNN to reason about the temporal interactions based on historical information. (c) The social attention module jointly aggregates the pairwise spatial-temporal interactions to capture the crowd representation in the crowd feature, which is then used to estimate the action values.
  • Figure 3: Quantitative evaluation on scenarios with different numbers of humans.
  • Figure 4: Simple and complex scenarios. They are all with 5 dynamic humans and 5 static humans with different combinations, such as (a) Completely dispersed. (b) Divided into two rows with 3 and 2 static humans separately. (c) Formed as concave.
  • Figure 5: Quantitative evaluation under complex scenarios.
  • ...and 2 more figures