Table of Contents
Fetching ...

Structured Graph Network for Constrained Robot Crowd Navigation with Low Fidelity Simulation

Shuijing Liu, Kaiwen Hong, Neeloy Chakraborty, Katherine Driggs-Campbell

TL;DR

This work tackles constrained robot crowd navigation under sim2real gaps by learning in a low-fidelity simulator using a split scene representation: detected humans and map/robot-localization derived obstacle point clouds. A spatio-temporal graph models interactions among robot, humans, and obstacles, with three dedicated attention networks (HH, OH, RH) and a GRU to produce robust policies trained via PPO. Empirical results in simulation show that full interaction modeling substantially improves navigation success and safety, while ablations highlight the importance of capturing human-human and human-obstacle interactions. Real-world experiments with a TurtleBot 2i demonstrate the approach's practical viability, though adversarial pedestrian behavior remains challenging and motivates future hierarchical planning and more realistic human models. Overall, the paper contributes a scalable, attention-guided framework that reduces sim2real gaps and enhances constrained crowd navigation in both simulated and real environments.

Abstract

We investigate the feasibility of deploying reinforcement learning (RL) policies for constrained crowd navigation using a low-fidelity simulator. We introduce a representation of the dynamic environment, separating human and obstacle representations. Humans are represented through detected states, while obstacles are represented as computed point clouds based on maps and robot localization. This representation enables RL policies trained in a low-fidelity simulator to deploy in real world with a reduced sim2real gap. Additionally, we propose a spatio-temporal graph to model the interactions between agents and obstacles. Based on the graph, we use attention mechanisms to capture the robot-human, human-human, and human-obstacle interactions. Our method significantly improves navigation performance in both simulated and real-world environments. Video demonstrations can be found at https://sites.google.com/view/constrained-crowdnav/home.

Structured Graph Network for Constrained Robot Crowd Navigation with Low Fidelity Simulation

TL;DR

This work tackles constrained robot crowd navigation under sim2real gaps by learning in a low-fidelity simulator using a split scene representation: detected humans and map/robot-localization derived obstacle point clouds. A spatio-temporal graph models interactions among robot, humans, and obstacles, with three dedicated attention networks (HH, OH, RH) and a GRU to produce robust policies trained via PPO. Empirical results in simulation show that full interaction modeling substantially improves navigation success and safety, while ablations highlight the importance of capturing human-human and human-obstacle interactions. Real-world experiments with a TurtleBot 2i demonstrate the approach's practical viability, though adversarial pedestrian behavior remains challenging and motivates future hierarchical planning and more realistic human models. Overall, the paper contributes a scalable, attention-guided framework that reduces sim2real gaps and enhances constrained crowd navigation in both simulated and real environments.

Abstract

We investigate the feasibility of deploying reinforcement learning (RL) policies for constrained crowd navigation using a low-fidelity simulator. We introduce a representation of the dynamic environment, separating human and obstacle representations. Humans are represented through detected states, while obstacles are represented as computed point clouds based on maps and robot localization. This representation enables RL policies trained in a low-fidelity simulator to deploy in real world with a reduced sim2real gap. Additionally, we propose a spatio-temporal graph to model the interactions between agents and obstacles. Based on the graph, we use attention mechanisms to capture the robot-human, human-human, and human-obstacle interactions. Our method significantly improves navigation performance in both simulated and real-world environments. Video demonstrations can be found at https://sites.google.com/view/constrained-crowdnav/home.
Paper Structure (19 sections, 5 equations, 5 figures)

This paper contains 19 sections, 5 equations, 5 figures.

Figures (5)

  • Figure 1: A split representation of constrained navigation scenario. In a dynamic scene, human information is obtained from detections by sensors. For obstacle information, we remove all humans and compute a point cloud from a known map and the robot's location. In this way, we can learn a robot policy with smaller sim2real gaps with a cheap low-fidelity simulator.
  • Figure 2: Illustration of map processing. Using off-the-shelf map processing techniques liu2014extracting, we can combine and smooth the edges of obstacles with irregular shapes. As a result, the processed map produces the obstacle point cloud representation, which introduces very small sim2real gaps. In the two raw maps on the left, we overlay the processed map on top of them for visualization purposes.
  • Figure 3: The spatial-temporal interaction graph and the network architecture. (a) Graph representation of crowd navigation. The robot node is in yellow, the $i$-th human node is $\mathrm{u}_i$, and the obstacle node is $o$. HH edges and HH functions are in blue, OH edges and OH functions are in green, and RH edges and RH functions are in red. The temporal function is in purple. (b) Our network. Three attention mechanisms are used to model the human-human, human-obstacle, and robot-human interactions. We use a GRU as the temporal function.
  • Figure 4: Two PyBullet simulation scenarios. (a) Random environment with random obstacles and circle-crossing humans. (b) Sim2real environment with fixed obstacles and the random human flow is designed based on the layout.
  • Figure 5: A two-level hierarchical planner. To enable long-horizon navigation, we can treat our method as a local planner and combine it with a global planner.