Table of Contents
Fetching ...

Don't Freeze, Don't Crash: Extending the Safe Operating Range of Neural Navigation in Dense Crowds

Jiefu Zhang, Yang Xu, Vaneet Aggarwal

TL;DR

A reinforcement learning approach for dense, variable-density navigation that attains zero-shot density generalization using a density-invariant observation encoding with density-randomized training and physics-informed proxemic reward shaping with density-adaptive scaling is proposed.

Abstract

Navigating safely through dense crowds requires collision avoidance that generalizes beyond the densities seen during training. Learning-based crowd navigation can break under out-of-distribution crowd sizes due to density-sensitive observation normalization and social-cost scaling, while analytical solvers often remain safe but freeze in tight interactions. We propose a reinforcement learning approach for dense, variable-density navigation that attains zero-shot density generalization using a density-invariant observation encoding with density-randomized training and physics-informed proxemic reward shaping with density-adaptive scaling. The encoding represents the distance-sorted $K$ nearest pedestrians plus bounded crowd summaries, keeping input statistics stable as crowd size grows. Trained with $N\!\in\![11,16]$ pedestrians in a $3\mathrm{m}\times3\mathrm{m}$ arena and evaluated up to $N\!=\!21$ pedestrians ($1.3\times$ denser), our policy reaches the goal in $>99\%$ of episodes and achieves $86\%$ collision-free success in random crowds, with markedly less freezing than analytical methods and a $>\!60$-point collision-free margin over learning-based benchmark methods. Codes are available at \href{https://github.com/jznmsl/PSS-Social}{https://github.com/jznmsl/PSS-Social}.

Don't Freeze, Don't Crash: Extending the Safe Operating Range of Neural Navigation in Dense Crowds

TL;DR

A reinforcement learning approach for dense, variable-density navigation that attains zero-shot density generalization using a density-invariant observation encoding with density-randomized training and physics-informed proxemic reward shaping with density-adaptive scaling is proposed.

Abstract

Navigating safely through dense crowds requires collision avoidance that generalizes beyond the densities seen during training. Learning-based crowd navigation can break under out-of-distribution crowd sizes due to density-sensitive observation normalization and social-cost scaling, while analytical solvers often remain safe but freeze in tight interactions. We propose a reinforcement learning approach for dense, variable-density navigation that attains zero-shot density generalization using a density-invariant observation encoding with density-randomized training and physics-informed proxemic reward shaping with density-adaptive scaling. The encoding represents the distance-sorted nearest pedestrians plus bounded crowd summaries, keeping input statistics stable as crowd size grows. Trained with pedestrians in a arena and evaluated up to pedestrians ( denser), our policy reaches the goal in of episodes and achieves collision-free success in random crowds, with markedly less freezing than analytical methods and a -point collision-free margin over learning-based benchmark methods. Codes are available at \href{https://github.com/jznmsl/PSS-Social}{https://github.com/jznmsl/PSS-Social}.
Paper Structure (18 sections, 14 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 18 sections, 14 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Overview of the PSS-Social pipeline. The simulator state is mapped to a fixed-dimensional observation consisting of ego-goal features, distance-sorted neighbor slots with $K$-cap truncation, and bounded crowd-summary scalars. The normalized observation is passed to an MLP policy trained with PPO. The training reward adds potential-based proxemic shaping with density-adaptive scaling to the environment’s extrinsic navigation reward.
  • Figure 2: Analysis of specific failure modes across density. (a) Collisions per episode measures the frequency of physical contact, distinguishing unsafe behavior from safe navigation. (b) Freezing rate tracks the percentage of time agents spend deadlocked/stationary, identifying conservative failure modes.
  • Figure 3: Episode Outcomes. Green: Safe success; Amber: Collision; Red: Timeout. Baselines shift from frequent collisions to total failure as density rises, while PSS-Social maintains robust performance.
  • Figure 4: Safe Success Rate across the density sweep.