CrowdSurfer: Sampling Optimization Augmented with Vector-Quantized Variational AutoEncoder for Dense Crowd Navigation
Naman Kumar, Antareep Singha, Laksh Nanwani, Dhruv Potdar, Tarun R, Fatemeh Rastgar, Simon Idoko, Arun Kumar Singh, K. Madhava Krishna
TL;DR
CrowdSurfer tackles dense-cCrowd navigation by decoupling trajectory generation from real-time optimization. It learns a discrete latent prior over expert trajectories with VQ-VAE, samples diverse priors via a perception-conditioned PixelCNN, and refines them at inference time using the PRIEST planner to satisfy kinematic and collision constraints, achieving real-time performance (~$20\,\mathrm{Hz}$) for a 5-second horizon. The approach improves over state-of-the-art DRL-VO in success rate and travel time across multiple environments, including unseen maps and changing layouts, and demonstrates robustness without a guaranteed global plan. Real-world experiments with Turtlebot2, Husky A200, and a custom wheelchair illustrate practical deployment challenges and the potential for broader adoption, while ablations highlight the benefits of environment-conditioned priors. Overall, CrowdSurfer offers a middle-ground framework that leverages expert demonstrations to inform long-horizon local planning with strong robustness to dynamics and map changes.
Abstract
Navigation amongst densely packed crowds remains a challenge for mobile robots. The complexity increases further if the environment layout changes, making the prior computed global plan infeasible. In this paper, we show that it is possible to dramatically enhance crowd navigation by just improving the local planner. Our approach combines generative modelling with inference time optimization to generate sophisticated long-horizon local plans at interactive rates. More specifically, we train a Vector Quantized Variational AutoEncoder to learn a prior over the expert trajectory distribution conditioned on the perception input. At run-time, this is used as an initialization for a sampling-based optimizer for further refinement. Our approach does not require any sophisticated prediction of dynamic obstacles and yet provides state-of-the-art performance. In particular, we compare against the recent DRL-VO approach and show a 40% improvement in success rate and a 6% improvement in travel time.
