Table of Contents
Fetching ...

Learning Strategies For Successful Crowd Navigation

Rajshree Daulatabad, Serena Nath

TL;DR

This work tackles socially compliant crowd navigation for autonomous robots by collecting real-world, robot–human interaction data through teleoperation and learning frame-by-frame behavior. It proposes a vision-based pipeline that builds 64x64 occupancy maps from a camera, uses OpenPose for human detection, and applies a convolutional neural network to regress the robot's next action (speed and turn) as a regression task, optimized with mean squared error. The study details hardware setup (Pioneer P3-DX), a triage of crowd scenarios, a complete data-collection and processing pipeline (including calibration, perspective correction, and normalization), and a CNN architecture trained with substantial hyperparameter tuning. Although real-world autonomous evaluation was limited by hardware issues, the paper demonstrates the feasibility of learning pro-social crowd navigation from real interaction data and outlines concrete steps to deploy and extend this approach to broader environments.

Abstract

Teaching autonomous mobile robots to successfully navigate human crowds is a challenging task. Not only does it require planning, but it requires maintaining social norms which may differ from one context to another. Here we focus on crowd navigation, using a neural network to learn specific strategies in-situ with a robot. This allows us to take into account human behavior and reactions toward a real robot as well as learn strategies that are specific to various scenarios in that context. A CNN takes a top-down image of the scene as input and outputs the next action for the robot to take in terms of speed and angle. Here we present the method, experimental results, and quantitatively evaluate our approach.

Learning Strategies For Successful Crowd Navigation

TL;DR

This work tackles socially compliant crowd navigation for autonomous robots by collecting real-world, robot–human interaction data through teleoperation and learning frame-by-frame behavior. It proposes a vision-based pipeline that builds 64x64 occupancy maps from a camera, uses OpenPose for human detection, and applies a convolutional neural network to regress the robot's next action (speed and turn) as a regression task, optimized with mean squared error. The study details hardware setup (Pioneer P3-DX), a triage of crowd scenarios, a complete data-collection and processing pipeline (including calibration, perspective correction, and normalization), and a CNN architecture trained with substantial hyperparameter tuning. Although real-world autonomous evaluation was limited by hardware issues, the paper demonstrates the feasibility of learning pro-social crowd navigation from real interaction data and outlines concrete steps to deploy and extend this approach to broader environments.

Abstract

Teaching autonomous mobile robots to successfully navigate human crowds is a challenging task. Not only does it require planning, but it requires maintaining social norms which may differ from one context to another. Here we focus on crowd navigation, using a neural network to learn specific strategies in-situ with a robot. This allows us to take into account human behavior and reactions toward a real robot as well as learn strategies that are specific to various scenarios in that context. A CNN takes a top-down image of the scene as input and outputs the next action for the robot to take in terms of speed and angle. Here we present the method, experimental results, and quantitatively evaluate our approach.
Paper Structure (24 sections, 9 equations, 8 figures)

This paper contains 24 sections, 9 equations, 8 figures.

Figures (8)

  • Figure 1: Raw camera footage prior to camera calibration.
  • Figure 2: Checkerboard calibration card used to recover camera and distortion parameters.
  • Figure 3: Three human subjects are detected with Openpose, highlighting their joints. The area is free of obstacles, and the image is aligned such that the floor is flat with the image. The robot is visible on the left near the trash bins.
  • Figure 4: Purple lines represent continued lines from the world used to find the optical center. Their intersection is the optical center. The white square represents the unoccupied space that the robot may roam. Green circles are the corrected detections of humans and robot markers. Red dotted circles are the detected neck key points.
  • Figure 5: An instance of an occupancy map used in the data. White circles represent people while the white area on the left represents a wall. The map is relative such that the robot is always in the middle and is always trying to move to the right side.
  • ...and 3 more figures