Learning Strategies For Successful Crowd Navigation
Rajshree Daulatabad, Serena Nath
TL;DR
This work tackles socially compliant crowd navigation for autonomous robots by collecting real-world, robot–human interaction data through teleoperation and learning frame-by-frame behavior. It proposes a vision-based pipeline that builds 64x64 occupancy maps from a camera, uses OpenPose for human detection, and applies a convolutional neural network to regress the robot's next action (speed and turn) as a regression task, optimized with mean squared error. The study details hardware setup (Pioneer P3-DX), a triage of crowd scenarios, a complete data-collection and processing pipeline (including calibration, perspective correction, and normalization), and a CNN architecture trained with substantial hyperparameter tuning. Although real-world autonomous evaluation was limited by hardware issues, the paper demonstrates the feasibility of learning pro-social crowd navigation from real interaction data and outlines concrete steps to deploy and extend this approach to broader environments.
Abstract
Teaching autonomous mobile robots to successfully navigate human crowds is a challenging task. Not only does it require planning, but it requires maintaining social norms which may differ from one context to another. Here we focus on crowd navigation, using a neural network to learn specific strategies in-situ with a robot. This allows us to take into account human behavior and reactions toward a real robot as well as learn strategies that are specific to various scenarios in that context. A CNN takes a top-down image of the scene as input and outputs the next action for the robot to take in terms of speed and angle. Here we present the method, experimental results, and quantitatively evaluate our approach.
