Table of Contents
Fetching ...

Attention Graph for Multi-Robot Social Navigation with Deep Reinforcement Learning

Erwan Escudie, Laetitia Matignon, Jacques Saraydaryan

TL;DR

MultiSoc presents a graph-based, multi-robot social navigation framework that combines two GNNs with an edge selector and crowd coordinator to model interactions among robots and humans within each robot’s field of view. Trained via MAPPO under CTDE, it uses predicted short-horizon trajectories and a sparsified interaction graph to enable efficient coordination and safety in dense crowds. Key contributions include the first graph-based multi-robot navigation model, a tunable neighborhood density parameter via edge selection, and robust performance under heterogeneous human policies and varying crowd densities. The approach improves learning speed, generalization, and scalability, offering a practical path toward deploying robot fleets in real crowded environments.

Abstract

Learning robot navigation strategies among pedestrian is crucial for domain based applications. Combining perception, planning and prediction allows us to model the interactions between robots and pedestrians, resulting in impressive outcomes especially with recent approaches based on deep reinforcement learning (RL). However, these works do not consider multi-robot scenarios. In this paper, we present MultiSoc, a new method for learning multi-agent socially aware navigation strategies using RL. Inspired by recent works on multi-agent deep RL, our method leverages graph-based representation of agent interactions, combining the positions and fields of view of entities (pedestrians and agents). Each agent uses a model based on two Graph Neural Network combined with attention mechanisms. First an edge-selector produces a sparse graph, then a crowd coordinator applies node attention to produce a graph representing the influence of each entity on the others. This is incorporated into a model-free RL framework to learn multi-agent policies. We evaluate our approach on simulation and provide a series of experiments in a set of various conditions (number of agents / pedestrians). Empirical results show that our method learns faster than social navigation deep RL mono-agent techniques, and enables efficient multi-agent implicit coordination in challenging crowd navigation with multiple heterogeneous humans. Furthermore, by incorporating customizable meta-parameters, we can adjust the neighborhood density to take into account in our navigation strategy.

Attention Graph for Multi-Robot Social Navigation with Deep Reinforcement Learning

TL;DR

MultiSoc presents a graph-based, multi-robot social navigation framework that combines two GNNs with an edge selector and crowd coordinator to model interactions among robots and humans within each robot’s field of view. Trained via MAPPO under CTDE, it uses predicted short-horizon trajectories and a sparsified interaction graph to enable efficient coordination and safety in dense crowds. Key contributions include the first graph-based multi-robot navigation model, a tunable neighborhood density parameter via edge selection, and robust performance under heterogeneous human policies and varying crowd densities. The approach improves learning speed, generalization, and scalability, offering a practical path toward deploying robot fleets in real crowded environments.

Abstract

Learning robot navigation strategies among pedestrian is crucial for domain based applications. Combining perception, planning and prediction allows us to model the interactions between robots and pedestrians, resulting in impressive outcomes especially with recent approaches based on deep reinforcement learning (RL). However, these works do not consider multi-robot scenarios. In this paper, we present MultiSoc, a new method for learning multi-agent socially aware navigation strategies using RL. Inspired by recent works on multi-agent deep RL, our method leverages graph-based representation of agent interactions, combining the positions and fields of view of entities (pedestrians and agents). Each agent uses a model based on two Graph Neural Network combined with attention mechanisms. First an edge-selector produces a sparse graph, then a crowd coordinator applies node attention to produce a graph representing the influence of each entity on the others. This is incorporated into a model-free RL framework to learn multi-agent policies. We evaluate our approach on simulation and provide a series of experiments in a set of various conditions (number of agents / pedestrians). Empirical results show that our method learns faster than social navigation deep RL mono-agent techniques, and enables efficient multi-agent implicit coordination in challenging crowd navigation with multiple heterogeneous humans. Furthermore, by incorporating customizable meta-parameters, we can adjust the neighborhood density to take into account in our navigation strategy.
Paper Structure (32 sections, 8 equations, 5 figures, 6 tables)

This paper contains 32 sections, 8 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Overview of MultiSoc process: (Top) Scene with two agents (robots) with 360° field of view (FoV). (Bottom) Each agent applies MultiSoc on a graph of its environment (limited to its FoV) with each entities (human/robot) as a node.
  • Figure 2: Oversimplified architectures of AttnGraph liu2023attngraph and MAGE-X yang2023magex. Agents (robots) are in red and agent of interest is surrounded by dotted line. (left) AttnGraph : At the end of each bloc is represented the graph actually computed and attention (edges width). (right) MAGE-X : At the end of each bloc is represented the graph actually computed.
  • Figure 3: Overview of the MultiSoc architecture. For the agent of interest (surrounded by dotted line), the input is its intrinsic information and a graph limited to its FoV. Each node of the graph is composed of the current and consecutive predicted positions of the observed entities, and by a label discriminating entities following their nature.
  • Figure 4: Overview of the Edge-Selector architecture producing a sparse graph $G_S$ with a MHA module with 2 heads.
  • Figure 5: Screenshots of a scenario (in chronological order from left to right) with 3 robots traveling among 20 humans in the MultiCrowdNav Simulator ($N_{head}=4$).